ArcKit-Informed AgentArmy Backlog Synthesis¶

Integrating Enterprise Architecture Patterns into the Template Platform¶

Executive Summary¶

ArcKit's approach to enterprise architecture governance provides three critical innovations that dramatically strengthen AgentArmy's template platform strategy:

Hook-Driven Automation — Every user prompt triggers context injection, validation, and provenance tracking. This is the infrastructure for Play 4 (Learning Loop).
Project Context Graph — Bidirectional dependency index injected on every session. Enables impact analysis, stale detection, orphan finding, and intelligent routing.
Artifact Lifecycle Management — Machine-readable manifest.json + multi-rendering (OWM primary / Mermaid secondary) = single source of truth + multiple audience formats.

These patterns apply directly to AgentArmy's routing, choreography, cost transparency, and especially the learning loop — transforming the 4 release trains from "document delivery" to "intelligent, self-healing platform."

1. Hook System Integration¶

ArcKit's hook architecture maps cleanly onto AgentArmy's routing & orchestration needs.

Release Train 1: Foundation & Routing (Jun–Jul)¶

New Feature: Claude Code Hook System for Template Platform

Hook	Trigger	AgentArmy Use Case	ArcKit Reference
SessionStart	Session begins	Initialize routing graph, load board state	`arckit-session`
UserPromptSubmit	Every prompt	(1) Inject project context + routing edges, (2) validate routing intent, (3) detect if prompt mentions cost/tracing	`arckit-context` + `secret-detection`
PreToolUse	Before Write/Edit	Validate issue naming convention (must be linked to RT/Feature), check if change impacts multiple release trains	`validate-arc-filename` + `file-protection`
PostToolUse	After Write/Edit	Emit board-sync event; update `manifest.json`; trigger impact analysis	`update-manifest` + `PostToolUse`
Stop	Session ends	Log routing outcomes, agent performance SLIs, learning loop feedback	`session-learner`

Size: L (hook infrastructure + config.toml wiring)
Routing: devops-engineer (hooks/automation) + tooling-engineer (skill integration)

Release Train 2: Operations & Quality (Jul–Aug)¶

New Feature: Project Context Graph Injection

Extends Play 4 (Learning Loop) with infrastructure that ArcKit uses for impact analysis:

// Injected on every UserPromptSubmit:
ProjectContextGraph {
  currentScope: "Release Train N, Feature M, Story X",
  relatedIssues: [issues with bidirectional edges],
  staleArtifacts: [items not updated in 30+ days],
  orphanedIssues: [not linked to a Feature or RT],
  impactRadius: {
    direct: [immediately affected items],
    transitive: [indirectly affected via dependencies],
    estimate: "cost to rebase / re-review"
  }
}

Size: M (graph building + edge traversal)
Routing: architect-reviewer (graph design) + observability-engineer (impact metrics)

Release Train 3: Spoke Readiness (Aug–Oct)¶

New Feature: Artifact Lifecycle & Manifest.json

Replaces ad-hoc artifact tracking with machine-readable manifest:

{
  "projects": {
    "rt-1": {
      "status": "released",
      "features": [
        {
          "id": "routing-policy-engine",
          "type": "feature",
          "createdDate": "2026-06-01",
          "lastModified": "2026-07-15",
          "status": "done",
          "size": "M",
          "health": "active",
          "depends_on": ["agent-specs-template"],
          "mentions_cost": false,
          "mentions_tracing": true
        }
      ]
    }
  }
}

Size: M (manifest schema + PostToolUse hook)
Routing: documentation-engineer (schema design) + devops-engineer (automation)

2. Multi-Rendering Strategy (Dual Output)¶

ArcKit's primary/secondary rendering pattern (OWM → create.wardleymaps.ai | Mermaid wardley-beta) applies to all major artifacts, not just Wardley maps.

Pattern: Every "Strategic" Artifact Gets Dual Format¶

Artifact	Primary Format	Secondary Format	When to Use
Wardley Map	OWM (create.wardleymaps.ai)	Mermaid `wardley-beta` + sourcing markers	Strategic landscape, evolution positioning
Release Train Roadmap	Markdown narrative + structured YAML	Mermaid Gantt + dependency graph	Timeline, dependencies, play sequencing
Feature Decomposition	GitHub issue hierarchy + acceptance criteria	Mermaid flowchart + capability tree	Scope definition, choreography design
Decision Record	MADR v4.0 narrative	Mermaid decision flow diagram	Architecture decisions, trade-offs
Routing Policy	YAML machine-readable rules	Mermaid decision tree diagram	Routing logic, ambiguity resolution

Implementation in Release Train 1/2: - Template for each artifact type (markdown primary + script to generate secondary) - Hook that auto-generates Mermaid secondary from structured primary - Both formats committed (primary is source of truth; secondary is audience-specific view)

Size: M (templates + generation scripts)
Routing: documentation-engineer (template design) + tooling-engineer (generation)

3. Skill System Enhancement¶

ArcKit has 128+ skills with standardized SKILL.md frontmatter. AgentArmy's 11 agent categories should become modular skills that compose into Skill Recipes (like ArcKit's command-chaining).

New Structure: Skills + Recipes¶

Skill Metadata (frontmatter):

---
name: agent-routing-policy-engine
description: "Build executable routing policy from CLAUDE.md table"
category: "02-language-specialists / meta-orchestration"
prerequisites: ["wardley-map", "agent-specs"]
estimated_duration: "3-5 days"
model_recommendation: "Sonnet (routing needs reasoning)"
token_budget: "50k-100k"
success_metrics:
  - "Routing is deterministic and testable"
  - "No routing ambiguity > 2% of tasks"
  - "Audit trail shows reasoning"
---

Skill Recipes (ArcKit pattern):

recipe: "template-platform-foundation"
skills:
  - agent-distinctiveness-advocate [audit existing agents]
  - wardley-strategist [map the landscape]
  - architect-reviewer [design policy engine]
  - tooling-engineer [build executor]
  - observability-engineer [instrument telemetry]
sequence: "linear (each depends on prior)"
estimated_total: "6 weeks"
rollback_strategy: "each skill is independently revertible"

Size: M (skill taxonomy + recipe system)
Routing: architect-reviewer (skill design) + tooling-engineer (composition)

4. Learning Loop Infrastructure (Play 4 Enhancement)¶

ArcKit's session-learner hook (Stop event) inspires AgentArmy's learning loop:

Enhanced Learning Loop Architecture¶

On every session end: 1. Emit learning event via Stop hook with: - What was delegated (task, estimated cost, agent(s) used) - What succeeded / failed - Rework rate (% of tasks that came back for revision) - Cost vs. estimate (token budget vs. actual) - Routing decision rationale (why this agent?)

Accumulate in Knowledge Base:

{
  "incidents": [
    {
      "date": "2026-07-15",
      "delegated_task": "create-routing-policy-engine",
      "assigned_to": ["architect-reviewer", "tooling-engineer"],
      "estimated_cost": "100k tokens",
      "actual_cost": "87k tokens",
      "rework_rate": 0.15,
      "root_causes": [
        "missing context on agent definitions",
        "scope creep on acceptance criteria"
      ],
      "remedy": "updated agent-specs template to include routing-policy signature",
      "impact": "next 3 similar tasks reduced rework by 40%"
    }
  ],
  "anti_patterns": [
    "delegating to single-agent when task spans 2+ agent categories",
    "not binding story acceptance criteria to routing policy"
  ],
  "competency_trends": {
    "architect-reviewer": {
      "tasks_completed": 12,
      "success_rate": 0.92,
      "avg_rework_cycles": 1.3,
      "trend": "improving (was 1.8 on first 3 tasks)"
    }
  }
}

Close the feedback loop:
Every 2 weeks: pattern analysis (knowledge-synthesizer agent)
Update prompts/routing rules based on lessons
Publish "lessons from this sprint" to team

Size: L (infrastructure + telemetry schema + analysis)
Routing: knowledge-synthesizer + observability-engineer + prompt-engineer

5. Enhanced Backlog Structure (All Release Trains)¶

New Backlog Conventions (Borrowed from ArcKit)¶

Every issue gets: 1. Artifact ID — RT-1-FEAT-001 format (Release Train, Feature #) 2. Health Status — active | draft | stale | orphaned (auto-tagged via hook) 3. Cost Metadata — estimated tokens, model recommendation, external dependencies 4. Traceability — parent Feature/Epic + related decisions (ADRs), research, vendor evaluation 5. Manifest Entry — auto-added to manifest.json on creation (PostToolUse hook)

Epic Example:

# RT1-EPIC-001: Foundation & Routing (Release Train 1)

## Metadata
- Status: in-progress
- PI: PI-1
- Type: Epic
- Size: L (sum of children)
- Health: active
- Cost: ~300k tokens (estimated)
- Model: Opus/Sonnet mix
- Depends On: [none]
- Enables: RT2-FEAT-*, Play 2, Spoke Init

## Features (Children)
- RT1-FEAT-001: Agent Spec Template + Capability Matrix
- RT1-FEAT-002: Executable Routing Decision Tree (YAML)
- RT1-FEAT-003: Claude Code Hook System
- RT1-FEAT-004: Telemetry Instrumentation (OTel spans)
- RT1-FEAT-005: Few-Shot Prompt Library Phase 1

## Acceptance Criteria
- [ ] All features merged
- [ ] Routing policy testable (>90% deterministic)
- [ ] Telemetry shows >500 calls instrumented
- [ ] No routing ambiguity > 5% of tasks
- [ ] Learning loop infrastructure ready for RT2

## Related Artifacts
- ADR-001: Why policy-engine over static table
- Wardley Map: Platform evolution landscape
- RFC-001: Hook system design

6. Enhanced Release Train Descriptions¶

Release Train 1: Foundation & Routing (Jun 1 – Jul 12)¶

Theme: Make platform intelligence explicit; instrument everything.

ArcKit Integration: - ✅ Hook system for UserPromptSubmit (context injection) - ✅ PreToolUse validation of artifact naming (ARC-pattern) - ✅ Project context graph injected on every prompt - ✅ Manifest.json machinery ready - ✅ Telemetry spans for every delegation decision

New Features (Total: 5) 1. Agent Spec Template + Capability Matrix 2. Executable Routing Decision Tree (YAML + validator) 3. [NEW] Claude Code Hook System (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop) 4. [NEW] Project Context Graph Injection 5. Telemetry Instrumentation (OTel spans, cost tracking) 6. Few-Shot Prompt Library (Phase 1)

Outcome: Routing is testable, observable, auditable.

Release Train 2: Operations & Quality (Jul 12 – Aug 23)¶

Theme: Formalize handoffs; close the learning loop.

ArcKit Integration: - ✅ Artifact lifecycle (PreToolUse validation + PostToolUse stamping) - ✅ Manifest.json auto-maintained - ✅ Multi-rendering templates (primary + secondary formats) - ✅ Impact analysis powered by project context graph

New Features (Total: 5 + carry-forward) 1. Multi-Agent Choreography (Saga patterns, state machines) 2. Agent Evaluation Gates (DoD rubrics, SLI/SLO) 3. Skill Scaffolding & Composition 4. [NEW] Learning Loop Runtime (error-coordinator + knowledge-synthesizer) 5. [NEW] Artifact Manifest & Multi-Rendering Templates 6. Cost Visibility & Provider Abstraction (Helicone integration)

Outcome: Failures are teachable; artifacts are discoverable.

Release Train 3: Spoke Readiness (Aug 23 – Oct 4)¶

Theme: Spoke teams self-serve; context travels with them.

ArcKit Integration: - ✅ Manifest.json drives spoke onboarding (what artifacts are needed) - ✅ Project context graph embedded in spoke template - ✅ Routing policy exported as JSON + Mermaid diagram for spoke adaptation

New Features (Total: 4 + carry-forward) 1. Hub→Spoke Onboarding Playbook (checklists, hooks, pre-commit validation) 2. Cost & Capacity Model (unit economics, showback) 3. [NEW] Artifact Manifest Export (spoke gets manifest.json + all templates) 4. Observability Dashboard (MTTR, success rate, cost per delegation) 5. Spoke-Specific Prompt Adaptation (greenfield vs. legacy)

Outcome: Spoke can initialize with full context.

Release Train 4: Learning & Advanced Observability (Oct 4 – Nov 15)¶

Theme: Accumulate and share lessons; iterate on routing.

ArcKit Integration: - ✅ Stop hook feeds lesson-learned KB - ✅ Competency trends auto-calculated from telemetry - ✅ Anti-patterns library emerges from incident analysis

New Features (Total: 4 + carry-forward) 1. Agent Lesson-Learned KB (incident log, anti-patterns library) 2. [NEW] Request Tracing & Decision Audit Log (end-to-end visibility) 3. [NEW] Feedback Integration (PR reviews → routing/prompt refinement) 4. Competency Evolution Tracking (agent performance trends)

Outcome: Platform learns from its own use.

7. ArcKit Patterns Mapping to AgentArmy¶

ArcKit Pattern	AgentArmy Analog	Release Train	Benefit
Hook system (SessionStart/Stop)	Agent lifecycle (initialize/finalize)	RT1	Automate context setup, learning capture
UserPromptSubmit context injection	Routing graph injection	RT1	Every prompt has full routing context
PreToolUse validation	Issue naming + traceability validation	RT1	Prevent orphaned work
PostToolUse manifest update	Board sync + artifact index	RT1-3	Single source of truth
Project context graph	Release train dependency graph	RT2	Impact analysis, stale detection
Multi-rendering (OWM + Mermaid)	Primary artifact + secondary views	RT2	Different audiences, same source
Artifact lifecycle metadata	Issue health + cost tracking	RT1-3	Governance + financial visibility
Session-learner Stop hook	Learning loop telemetry	RT4	Closed feedback loop
Skill + Recipe system	Agent categories + composition	RT2-3	Reusable patterns, skill discovery

8. Implementation Roadmap (By Week)¶

RT1: Foundation & Routing (Weeks 1–6)¶

Week	Feature	Dependencies	Deliverable
1–2	Agent Spec Template + Capability Matrix	None	Template doc + validation schema
2–3	Routing policy engine (YAML + validator)	Agent specs	`routing-policy.yaml` + test suite
3–4	Hook system wiring (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop)	Routing policy	`hooks.json` + hook implementations
4–5	Project context graph injection	Routing policy + hooks	Graph injected on every UserPromptSubmit
5–6	Telemetry instrumentation (OTel spans)	Hooks + context graph	Spans emitted for every delegation
1–6 (concurrent)	Prompt library Phase 1	Agent specs	Few-shot examples per category

Validation Gate: All 5 features merged; no new issues filed without RT-X-FEAT linking.

RT2: Operations & Quality (Weeks 7–12)¶

Week	Feature	Dependencies	Deliverable
7–8	Multi-Agent Choreography	Routing policy	Saga patterns + state machines (YAML)
8–9	Agent Evaluation Gates + rubrics	Choreography	DoD matrix + SLI/SLO framework
9–10	Skill scaffolding & recipes	Agent registry	Skill SKILL.md template + recipe schema
10–11	Learning Loop runtime	Evaluation gates	error-coordinator + knowledge-synthesizer ops
11–12	Artifact manifest + multi-rendering	Learning loop	`manifest.json` schema + template generation
11–12 (concurrent)	Cost visibility integration	None	Helicone adapter + estimate surface

Validation Gate: Learning loop processes ≥10 incident records; manifest.json auto-generated on every artifact Write.

RT3: Spoke Readiness (Weeks 13–18)¶

Week	Feature	Dependencies	Deliverable
13–14	Onboarding playbook + hooks	Manifest schema	Checklist + pre-commit hook suite
14–16	Cost model + capacity planning	Cost visibility	Unit economics calculator
16–17	Manifest export for spokes	Artifact manifest	Spoke template includes manifest.json
17–18	Observability dashboard	Telemetry + cost	MTTR + cost-per-delegation visualization
13–18 (concurrent)	Spoke prompt adaptation	Prompt library Phase 1	Greenfield vs. legacy prompt variants

Validation Gate: ≥1 spoke instantiated with full manifest + hooks running.

RT4: Learning & Intelligence (Weeks 19–24)¶

Week	Feature	Dependencies	Deliverable
19–20	Lesson-Learned KB structure	Learning loop	Incident schema + anti-patterns taxonomy
20–21	Request tracing (end-to-end)	Telemetry + hooks	Tracing spans from issue → route → PR → merge
21–22	Feedback integration	Tracing + KB	PR comments → routing/prompt rules update
22–24	Competency evolution tracking	All telemetry	Agent performance trends + self-healing

Validation Gate: Platform has ≥50 incident records; competency trends show agent X improved Y% on task Z.

9. Cost & Capacity Estimate (With ArcKit Patterns)¶

Release Train	Duration	Team	Agent Routing	Notes
RT1	6 weeks	3.0 FTE	architect-reviewer, observability-engineer, tooling-engineer, prompt-engineer	Hook system is new complexity but enables everything downstream
RT2	6 weeks	3.5 FTE	workflow-orchestrator, observability-engineer, knowledge-synthesizer, tooling-engineer, prompt-engineer	Learning loop runtime requires careful design
RT3	6 weeks	2.5 FTE	platform-engineer, finops-engineer, prompt-engineer, observability-engineer	Spoke validation gates capacity
RT4	6 weeks	2.0 FTE	knowledge-synthesizer, observability-engineer, prompt-engineer	Continuous improvement cycle
Total	24 weeks (6 months)	~2.75 FTE avg	15–20 agents across 4 categories	ArcKit patterns add ~15% overhead but enable 40%+ faster learning cycle

10. Success Criteria (Enhanced with ArcKit Metrics)¶

RT1¶

✅ Routing policy is deterministic (>90% of decisions match policy)
✅ Telemetry spans emitted for ≥500 delegations
✅ Zero orphaned issues (all linked to RT/Feature)
✅ Manifest.json auto-generated on artifact creation
✅ [NEW] Project context graph injected on every prompt (>95% injection success)
✅ [NEW] Hook system stable (zero timeouts, <5ms overhead)

RT2¶

✅ Learning loop closes: failures → KB entry → lesson captured
✅ ≥10 incident records in KB with root causes + remedies
✅ Agent evaluation rubrics reduce rework by ≥20%
✅ Skill recipes are discoverable + composable
✅ [NEW] Multi-rendering templates adopted for all strategic artifacts
✅ [NEW] Manifest.json maintained for 100% of artifacts

RT3¶

✅ Spoke instantiation takes <2 hours (was manual, now scripted)
✅ Spoke teams inherit full routing policy + cost model
✅ ≥1 real spoke deployed with hooks running
✅ [NEW] Manifest.json export includes all necessary artifacts
✅ [NEW] Observability dashboard shows spoke-specific metrics

RT4¶

✅ Lesson-Learned KB has ≥50 incidents with trending
✅ Anti-patterns library has ≥15 documented patterns
✅ Agent competency trends show improvement on ≥3 task types
✅ [NEW] Request tracing shows full path (issue → route → PR → merge)
✅ [NEW] Feedback loop closes: PR comments → routing rule update → next task uses updated rule

11. Risk Mitigation (ArcKit Lessons)¶

Risk	ArcKit Mitigation	AgentArmy Application
Hooks become bottleneck	ArcKit limits hook timeout to 10s; async where possible	Implement timeout + fallback
Manifest.json drift	PostToolUse hook always updates; version-controlled	Similar pattern: hook is source of truth
Graph injection overhead	Graph is lazy-loaded; only for relevant prompts	Cache graph, invalidate on board change
Skill proliferation	ArcKit has 128 skills but strict taxonomy + MECE governance	Apply `agent-distinctiveness-advocate` audit quarterly
Learning loop noise	ArcKit filters incidents by severity + root cause; requires human review	Implement incident severity scoring + auto-triage

Next Steps¶

Review this synthesis — Does the ArcKit integration feel right? Any concerns?
Create GitHub issues from sections 5–8 (Release Train 1-4 breakdown)
Add to project board with Type, PI, Size, Estimate, routing guidance
Kick off RT1 — Week of Jun 1

Estimated time to full synthesis: 2–3 hours (review + issue creation + board population)

Document prepared: 2026-05-23
Based on: ArcKit-codex analysis + AgentArmy Wardley analysis + Strategic Plays framework
Next review: After RT1 completion (mid-July)