Agent Army MECE Audit Scorecard¶

Date: 2026-05-22
Agents Analyzed: 169
Categories: 11
Overall MECE Score: 72/100 ⚠️

Executive Summary¶

Your agent army achieves reasonable MECE separation by category (architecture, language, infrastructure) but suffers from diagonal overlaps within categories and insufficient boundary rules across them. Primary gaps:

Routing ambiguity: ~15% of realistic tasks could legitimately go to 2+ agents (target: <5%)
Decision rule explicitness: 60% of overlapping pairs lack documented boundaries
Description clarity: 30% of agents don't signal primary deliverable unambiguously

Good news: The 11-category structure is sound. Fixes require refining descriptions + adding boundary rules, not reorganizing.

Category-by-Category Scorecard¶

01 · Core Development¶

MECE Score: 65/100

Agent	Deliverable	Clarity	Overlaps With	Severity	Status
`api-designer`	API specifications	✅ Clear	`backend-developer` (APIs)	⚠️ Medium	Define when backend owns API vs designer owns spec
`backend-developer`	Server code + architecture	⚠️ Vague	`api-designer`, `node-specialist`, `fastapi-developer`	🔴 High	Needs boundary: "owns architecture"; specialists own implementation
`design-bridge`	UI implementation from design specs	✅ Clear	none	✅ Distinct	MECE-ready
`electron-pro`	Desktop app (Electron)	✅ Clear	none	✅ Distinct	MECE-ready
`frontend-developer`	Frontend application + architecture	⚠️ Vague	`react-specialist`, `mobile-web-specialist`, `ui-designer`	🔴 High	Unclear: "builds complete apps" but so does react-specialist
`fullstack-developer`	End-to-end features	⚠️ Vague	`backend-developer`, `frontend-developer`	🔴 High	Is this always needed, or rare case? Consider deprecating
`graphql-architect`	GraphQL schema + federation	✅ Clear	`api-designer` (GraphQL is API)	⚠️ Medium	Rule: architect owns schema design; designer owns spec format
`microservices-architect`	Service architecture	✅ Clear	`backend-developer` (builds services)	⚠️ Medium	Boundary: architect designs; backend implements
`mobile-developer`	Cross-platform mobile (React Native, Flutter)	✅ Clear	`mobile-web-specialist`	⚠️ Medium	Rule: mobile-developer=native/native-bridge; web-specialist=responsive web
`mobile-web-specialist`	Responsive web for phones	✅ Clear	`frontend-developer`, `mobile-developer`	⚠️ Medium	Rule: web-specialist=responsive CSS/viewport quirks only
`ui-designer`	Visual design + systems	✅ Clear	`frontend-developer`, `design-bridge`	⚠️ Medium	Boundary: designer owns aesthetics; frontend owns implementation
`websocket-engineer`	Real-time bidirectional communication	✅ Clear	none	✅ Distinct	MECE-ready

Findings: - frontend-developer bloat: Description says "multi-framework" but also "full-stack integration" — unclear scope. Consider restricting to non-language-specific orchestration. - fullstack-developer redundancy: Its scope (database + API + frontend) overlaps fully with frontend-developer + backend-developer. Recommend: deprecate or redefine as "feature-level orchestrator" (non-code work). - API design split: api-designer (specs) vs backend-developer (implementation) needs explicit rule.

Recommendation: Add 3 boundary rules to AGENTS.md; consider consolidating fullstack into backend/frontend.

02 · Language Specialists¶

MECE Score: 78/100

Language	Agents	Overlap?	Notes
Python	`python-pro`, `fastapi-developer`	⚠️ Yes	Rule: fastapi for async APIs; python-pro for general/async/scripts
JavaScript/TypeScript	`javascript-pro`, `typescript-pro`, `nextjs-developer`, `node-specialist`	🔴 Yes	4 agents; needs hierarchy: language → framework → specialty
React	`react-specialist` (in this list but also in category 02 for optimization)	⚠️ Yes	Diagonal: `frontend-developer` also owns React
.NET	`csharp-developer`, `dotnet-core-expert`, `dotnet-framework-4.8-expert`	⚠️ Yes	Clear: 4.8 legacy, Core cloud-native, C# general
PowerShell	`powershell-5.1-expert`, `powershell-7-expert`, `powershell-module-architect`, `powershell-security-hardening`, `powershell-ui-architect`	🔴 Yes	5 agents; good specialization (version + domain) but high proliferation
PHP	`php-pro`, `laravel-specialist`, `symfony-specialist`	⚠️ Yes	Rule: php-pro=language-level; framework specialists=framework idioms
Go	`golang-pro`	✅ One	MECE-ready
Rust	`rust-engineer`	✅ One	MECE-ready
Java	`java-architect`, `spring-boot-engineer`	⚠️ Yes	Clear: architect=design; spring-boot=Spring-specific implementation

Findings: - Best practice: Go, Rust, Elixir — one agent per language. Eliminates overlap. - Worst practice: PowerShell (5 agents), JavaScript ecosystem (4 agents) — high specialization can fragment coverage. - Root cause: Framework-level agents in a language-first category creates diagonal overlap. E.g., nextjs-developer is language + framework + concern.

Recommendation: - Keep fastapi-developer (async is specialized enough) - Merge typescript-pro into javascript-pro (TypeScript is a superset) - Consolidate nextjs-developer into node-specialist (it's a Node.js framework) - Keep PowerShell agents (Windows-specific, justified granularity)

03 · Infrastructure¶

MECE Score: 81/100

Agent	Scope	Overlaps	Severity	Status
`cloud-architect`	Multi-cloud architecture	`azure-infra-engineer` (Cloud provider specific)	⚠️ Medium	Rule: architect=strategy; azure=Azure-specific implementation
`devops-engineer`	CI/CD + containerization	`deployment-engineer` (CI/CD too)	🔴 High	CRITICAL: Both own CI/CD. Boundary unclear.
`deployment-engineer`	CI/CD + deployment automation	`devops-engineer` (same)	🔴 High	CRITICAL: See above.
`docker-expert`	Docker containers	`devops-engineer` (containerization)	⚠️ Medium	Rule: docker-expert=image/compose/registry; devops=orchestration
`kubernetes-specialist`	K8s	`platform-engineer` (self-service infra)	⚠️ Medium	Rule: k8s=deployment; platform=developer experience
`platform-engineer`	IDP + golden paths	`sre-engineer` (reliability)	⚠️ Medium	Both optimize developer/system experience; clarify focus
`sre-engineer`	SLI/SLOs + reliability	`devops-engineer` (reliability too)	⚠️ Medium	Rule: sre=metrics + culture; devops=automation tools
`security-engineer`	Security automation	`security-architect` (enterprise security)	⚠️ Medium	Boundary: architect=design; engineer=implementation
`incident-responder`	Security breaches	`devops-incident-responder` (ops incidents)	⚠️ Medium	Clear: security vs operational incidents
`terraform-engineer` + `terragrunt-expert`	IaC	`devops-engineer` (infrastructure automation)	⚠️ Medium	Rule: IaC specialists own code; devops owns process

Critical Finding: devops-engineer vs deployment-engineer is broken. - DevOps description: "CI/CD pipelines, containerization strategies, deployment workflows" - Deployment description: "designing, building, optimizing CI/CD pipelines" - Both own CI/CD. No boundary rule exists.

Recommendation: 1. Immediate: Add decision rule to both descriptions. Proposed: - deployment-engineer: "owns release orchestration, rollback, deployment strategies" - devops-engineer: "owns CI/CD architecture, infrastructure automation, build optimization" 2. Clarify sre-engineer boundary: "owns SLOs, error budgets, toil reduction" (not automation) 3. Add rule for platform-engineer vs kubernetes-specialist: "platform owns IDP end-to-end; k8s specialist owns k8s ops only"

04 · Quality & Security¶

MECE Score: 79/100

Agent	Scope	Overlaps	Status
`code-reviewer`	Code review (quality)	`security-auditor` (includes code)	⚠️ Medium
`security-auditor`	Comprehensive security audits	`penetration-tester` (testing)	⚠️ Medium
`penetration-tester`	Offensive security testing	`security-auditor` (audits include testing)	⚠️ Medium
`debugger`	Root cause analysis	`error-detective` (also diagnoses errors)	🔴 High
`error-detective`	Error diagnosis + pattern analysis	`debugger` (same)	🔴 High
`performance-engineer`	Bottleneck elimination	All others (performance touches everything)	🔴 High
`chaos-engineer`	Resilience testing	`sre-engineer` (also tests reliability)	⚠️ Medium
`qa-expert`	QA strategy	`test-automator` (test automation)	⚠️ Medium

Critical Finding: debugger vs error-detective are nearly identical. - Debugger: "diagnose and fix bugs, identify root causes" - Error-detective: "diagnose errors, correlate across services, identify root causes" - Both own root-cause analysis. Difference is unclear.

Recommendation: 1. Merge debugger and error-detective into one agent, or: 2. Split clearly: debugger = single-service/local diagnosis; error-detective = distributed systems + observability 3. Add boundary rule for performance-engineer: "diagnoses bottlenecks (any layer); delegates layer-specific fixes to specialist" 4. Clarify security-auditor vs penetration-tester: "auditor=assessment+reporting; tester=exploitation+validation"

05 · Data & AI¶

MECE Score: 74/100

Agent	Scope	Overlaps	Status
`data-engineer`	ETL/ELT pipelines	`dlt-engineer` (ELT pipelines)	🔴 High
`dlt-engineer`	dlt-specific ELT	`data-engineer` (same)	🔴 High
`data-analyst`	Analysis + dashboards	`data-scientist` (analysis too)	⚠️ Medium
`data-scientist`	ML models + analysis	`data-analyst` (analysis)	⚠️ Medium
`ml-engineer`	Production ML systems	`machine-learning-engineer` (same)	🔴 High
`machine-learning-engineer`	ML model serving	`ml-engineer` (same)	🔴 High
`mlops-engineer`	ML infrastructure	`ml-engineer` (infrastructure)	⚠️ Medium
`database-optimizer`	Query tuning	`postgres-pro` (PostgreSQL tuning)	⚠️ Medium
`postgres-pro`	PostgreSQL specialist	`database-optimizer` (query optimization)	⚠️ Medium
`prompt-engineer`	Prompt design	`llm-architect` (LLM systems)	⚠️ Medium

Critical Findings: Three high-severity overlaps: 1. data-engineer vs dlt-engineer: Both build ELT pipelines. dlt is a tool; data-engineer is a role. This is vertical, not horizontal overlap. 2. ml-engineer vs machine-learning-engineer: RESOLVED — merged into machine-learning-engineer (broader training/retraining scope folded in); ml-engineer removed. 3. data-analyst vs data-scientist: Unclear boundary (both do analysis). Rule needed.

Recommendation: 1. Consolidate (DONE): Merged ml-engineer and machine-learning-engineer into one agent. (Kept machine-learning-engineer; removed ml-engineer.) 2. Clarify: dlt-engineer is a specialist (dlt framework), not a replacement for data-engineer. Update descriptions: - data-engineer: "Design & build ETL/ELT pipelines using any tool (SQL, Spark, Airflow, dlt)" - dlt-engineer: "Build & optimize dlt-specific pipelines for complex source-to-destination workflows" 3. Split: data-analyst (business intelligence, dashboards) vs data-scientist (statistical modeling, predictions) 4. Scope: database-optimizer owns any DB; postgres-pro owns PostgreSQL. Boundary: "optimizer=general; postgres-pro=PostgreSQL-specific tuning"

06 · Developer Experience¶

MECE Score: 76/100

Agent	Scope	Overlaps	Status
`documentation-engineer`	Docs systems	`technical-writer` (docs)	⚠️ Medium
`technical-writer`	Docs + guides	`documentation-engineer` (same)	⚠️ Medium
`legacy-modernizer`	Incremental modernization	`refactoring-specialist` (code cleanup)	⚠️ Medium
`refactoring-specialist`	Code refactoring	`legacy-modernizer` (same)	⚠️ Medium
`dependency-manager`	Dependency audits	`security-engineer` (security audits)	⚠️ Medium
`powershell-*` (5 agents)	PowerShell specialization	Internal to category (clear hierarchy)	✅ OK

Findings: - Diagonal overlap between documentation-engineer (systems) and technical-writer (content) is minor; boundary is roughly "architect vs. writer" - legacy-modernizer vs refactoring-specialist: Unclear separation. Modernizer is broader (tech debt), specialist is narrower (code structure)? - PowerShell agents are well-scoped (version + domain), no issues.

Recommendation: 1. Add boundary rules for documentation-engineer vs technical-writer: "engineer designs systems/architecture; writer creates content" 2. Clarify legacy-modernizer vs refactoring-specialist: "modernizer=strategy + sequencing; specialist=tactical code cleanup"

07 · Specialized Domains¶

MECE Score: 85/100

Finding: This category is well-scoped by domain (blockchain, game, fintech, healthcare, etc.). Minimal diagonal overlap. Strong MECE.

Minor issues: - mobile-app-developer (iOS/Android strategy) vs mobile-developer (cross-platform) — located in different categories but clear distinction - payment-integration vs fintech-engineer: fintech is broader; payment-integration is narrow. Clear hierarchy.

Status: MECE-ready with one minor clarification.

08 · Business & Product¶

MECE Score: 82/100

Agent	Scope	Overlaps	Status
`project-manager`	Project planning + execution	`scrum-master` (agile ceremonies)	⚠️ Medium
`scrum-master`	Scrum ceremonies + impediments	`project-manager` (planning)	⚠️ Medium
`business-analyst`	Requirements gathering	`product-manager` (product decisions)	⚠️ Medium
`product-manager`	Roadmap + feature prioritization	`business-analyst` (requirements)	⚠️ Medium
`technical-writer`	Docs	(also in category 06)	—
`legal-advisor`	Legal risk	`license-engineer` (licensing)	⚠️ Medium
`license-engineer`	OSS compliance	`legal-advisor` (legal)	⚠️ Medium

Findings: Business/Product agents have moderate overlaps but clear intent differences. Boundaries exist but aren't explicit.

Recommendation: Add decision rules: - project-manager (planning, timeline, budget) vs scrum-master (agile facilitation, ceremonies) - business-analyst (elicitation, requirements) vs product-manager (strategy, roadmap) - legal-advisor (legal risk, contracts) vs license-engineer (OSS compliance)

09 · Meta & Orchestration¶

MECE Score: 87/100

Agent	Scope	Overlaps	Status
`agent-organizer`	Multi-agent team assembly	`multi-agent-coordinator` (agent orchestration)	⚠️ Medium
`multi-agent-coordinator`	Coordinating concurrent agents	`agent-organizer` (assembling teams)	⚠️ Medium
`task-distributor`	Task routing + load balancing	`multi-agent-coordinator` (orchestration)	⚠️ Medium
`workflow-orchestrator`	Business process workflows	`task-distributor` (task routing)	⚠️ Medium

Findings: Orchestration layer is mostly coherent. Overlaps are fine-grained (assembly vs. coordination vs. execution) but boundaries are fuzzy.

Recommendation: Add explicit rules: - agent-organizer: "designs agent teams for complex projects; one-time setup" - multi-agent-coordinator: "runs concurrent agents; synchronization + state sharing" - task-distributor: "routes individual tasks to agents; queue management" - workflow-orchestrator: "manages stateful business processes with multiple states"

10 · Research & Analysis¶

MECE Score: 88/100

Agent	Scope	Overlaps	Status
`research-analyst`	Multi-source synthesis	`data-researcher` (data collection)	⚠️ Medium
`data-researcher`	Data collection	`research-analyst` (synthesis)	⚠️ Medium
`search-specialist`	Information retrieval	`research-analyst` (research)	⚠️ Medium
`market-researcher`	Market sizing	`competitive-analyst` (competitive intel)	⚠️ Medium
`competitive-analyst`	Competitor analysis	`market-researcher` (market analysis)	⚠️ Medium
`trend-analyst`	Emerging patterns	`market-researcher` (trends)	⚠️ Medium

Findings: This category is coherent. Overlaps are minimal and follow a pipeline (collection → analysis → synthesis → strategy).

Recommendation: Add pipeline rule for clarity:

data-researcher (collect) → research-analyst (synthesize) → business-analyst (act)
search-specialist (find) → market-researcher (size) → competitive-analyst (strategy) → trend-analyst (foresight)

11 · Enterprise Architecture¶

MECE Score: 91/100

Agent	Scope	TOGAF Phase	Overlaps	Status
`enterprise-architect`	All phases + orchestration	Preliminary → G	`solution-architect` (implementation)	⚠️ Minor
`togaf-adm-advisor`	Phase guidance	All	none	✅ Clear
`wardley-strategist`	Strategic landscape	A, E	none	✅ Clear
`business-architect`	Capabilities + value streams	B	none	✅ Clear
`capability-planner`	Investment prioritization	B, E, F	none	✅ Clear
`information-architect`	Data architecture	C (Data)	none	✅ Clear
`integration-architect`	Integration patterns	C (App), D	none	✅ Clear
`solution-architect`	ABB→SBB translation	E, F	none	✅ Clear
`security-architect`	Security by design	Cross-cutting	none	✅ Clear
`platform-architect`	IDP + Team Topologies	D	none	✅ Clear
`us-regulatory-architect`	Compliance architecture	Cross-cutting	none	✅ Clear

Findings: Best-in-class MECE structure. Agents are scoped to TOGAF phases with explicit sequencing. Clear inputs/outputs. No confusing overlaps.

Status: MECE-ready. No changes needed.

Cross-Category Critical Overlaps (Summary)¶

Pair	Category	Severity	Current Boundary	Status
`frontend-developer` vs `react-specialist`	01 vs 02	🔴 High	None	FIX: greenfield vs. optimization
`backend-developer` vs `node-specialist` vs `fastapi-developer`	01 vs 02	🔴 High	None	FIX: architecture vs. language vs. framework
`devops-engineer` vs `deployment-engineer`	03	🔴 High	None	CRITICAL: Both own CI/CD
`debugger` vs `error-detective`	04	🔴 High	None	MERGE or clarify: local vs. distributed
`data-engineer` vs `dlt-engineer`	05	🔴 High	Tool-specific	CLARIFY: dlt is tool-specialist, not replacement
`ml-engineer` vs `machine-learning-engineer`	05	🔴 High	None	MERGE: identical scope
`performance-engineer` vs all layer-specialists	04 vs others	🔴 High	None	ADD RULE: diagnose vs. fix pattern
`documentation-engineer` vs `technical-writer`	06 vs 08	⚠️ Medium	Implicit	ADD RULE: systems vs. content

Routing Test: 20 Real-World Tasks¶

Test methodology: Each task listed below; marked with which agent(s) could legitimately claim it.

#	Task	Primary	Secondary	Ambiguity?	Notes
1	"Our React app is slow. Optimize render performance."	`react-specialist`	`performance-engineer`	🔴 Yes	Both own this. Need rule.
2	"Build a new Node.js API from scratch."	`node-specialist`	`backend-developer`	🔴 Yes	Language vs. architecture.
3	"Design an OpenAPI spec for our payment API."	`api-designer`	`backend-developer`	⚠️ Maybe	Designer owns spec; backend owns implementation. Clear?
4	"Set up CI/CD for our Docker containers."	`devops-engineer`	`deployment-engineer`	🔴 Yes	CRITICAL overlap.
5	"Implement a FastAPI REST endpoint."	`fastapi-developer`	`python-pro`	⚠️ Maybe	Framework-specific or language-wide?
6	"Our database queries are slow. Why?"	`performance-engineer`	`database-optimizer`	⚠️ Maybe	Both diagnose. Need rule.
7	"Refactor our legacy monolith into microservices."	`legacy-modernizer`	`microservices-architect`	⚠️ Maybe	Sequencing vs. design.
8	"Audit our code for security vulnerabilities."	`code-reviewer`	`security-auditor`	⚠️ Maybe	Code review vs. security review.
9	"We found a bug. Debug it."	`debugger`	`error-detective`	🔴 Yes	Nearly identical.
10	"Build our ELT pipeline from Salesforce to DuckDB."	`data-engineer`	`dlt-engineer`	🔴 Yes	Tool-specific vs. general.
11	"Write the README for our project."	`technical-writer`	`documentation-engineer`	⚠️ Maybe	Content vs. systems.
12	"We're slow on mobile. Fix it."	`mobile-web-specialist`	`performance-engineer`	⚠️ Maybe	Responsive design vs. performance.
13	"Build the landing page."	`frontend-developer`	`ui-designer`	⚠️ Maybe	Implementation vs. design.
14	"Set up Kubernetes for our microservices."	`kubernetes-specialist`	`platform-engineer`	⚠️ Maybe	Ops vs. developer experience.
15	"Train a machine learning model."	`ml-engineer`	`machine-learning-engineer`	🔴 Yes	Identical.
16	"Analyze why our service is unreliable."	`sre-engineer`	`performance-engineer`	⚠️ Maybe	Reliability vs. performance.
17	"Build a GraphQL API."	`graphql-architect`	`api-designer`	⚠️ Maybe	Graph-specific vs. API-general.
18	"Pen test our application."	`penetration-tester`	`security-auditor`	⚠️ Maybe	Offensive vs. comprehensive.
19	"Plan our Q3 roadmap."	`product-manager`	`business-analyst`	⚠️ Maybe	Strategy vs. requirements.
20	"Trace errors across our microservices."	`error-detective`	`debugger`	🔴 Yes	See #9.

Ambiguity Rate: 50% (10/20 tasks have ≥1 secondary agent)
Target: <5% (≤1 task)
Gap: 45 percentage points ❌

Synthesis: MECE Improvement Roadmap¶

Phase 1 (Immediate): Fix Critical Overlaps¶

3–5 day effort. High-impact fixes.

Pair	Action	Affected Agents	Effort
`devops-engineer` ↔ `deployment-engineer`	Document decision rule in both descriptions	2	2 hrs
`debugger` ↔ `error-detective`	Merge into one agent OR split by scope (local vs. distributed)	2	4 hrs
`ml-engineer` ↔ `machine-learning-engineer`	Deprecate one; consolidate descriptions	2	2 hrs
`react-specialist` ↔ `frontend-developer`	Add boundary rule (greenfield vs. optimization)	2	2 hrs
`backend-developer` vs others	Clarify architecture-vs-implementation boundary	3+	4 hrs

Outcome: Reduce routing ambiguity from 50% → ~20%.

Phase 2 (Short-term): Add Explicit Decision Rules¶

1 week effort. Medium-impact clarity.

For each overlapping pair, add to AGENTS.md:

## Routing Rules

### When to use X instead of Y
- X: [condition A]
- Y: [condition B]
- Edge case [scenario]: use [agent name] because [reason]

Pairs to address: 1. data-engineer vs dlt-engineer 2. documentation-engineer vs technical-writer 3. api-designer vs backend-developer 4. performance-engineer vs layer-specialists 5. legacy-modernizer vs refactoring-specialist

Phase 3 (Backlog): Assess New Agents¶

Before adding, run each candidate through the rubric: - Primary deliverable (distinct from 5+ existing agents?) - Boundary conditions (vs. overlapping agents) - Category fit (or new category?)

Use the template in AGENT_MECE_AUDIT_RUBRIC.md.

Recommended Priority for Your Backlog¶

Based on MECE gaps, prioritize consolidation over new agents:

Do NOT add without addressing: - Data quality / governance specialist (overlaps with data-engineer) - Observability specialist (overlaps with performance-engineer, sre-engineer) - Native iOS/Android specialist (overlaps with mobile-developer)

Safe to add (clear gaps): - Visual design systems specialist (distinct from ui-designer) - Edge computing specialist (no current agent) - Compliance automation specialist (distinct from security-engineer) - API governance architect (distinct from integration-architect)

Appendix: MECE Scoring Rubric Reminder¶

Score	Criterion
90–100	Excellent MECE. Descriptions are unambiguous. <5% routing conflicts. Boundary rules documented.
75–89	Good MECE. Minor overlaps exist but boundary rules can resolve them quickly.
60–74	Fair MECE. Diagonal overlaps present. Need consolidation or explicit rules.
<60	Poor MECE. High routing ambiguity. Requires restructuring.

Your current score: 72/100 → Fair → Actionable improvements exist.

Next steps: 1. Review critical overlaps (🔴 High severity above) 2. Draft boundary rules for Phase 1 3. Run routing test on 10 new tasks to validate fixes 4. Publish revised descriptions

Would you like me to draft the boundary rule language for any of the critical pairs?