Multi-Agent Systems
When to Use Multi-Agent Architectures
Multi-agent systems introduce overhead. Every additional agent represents another potential point of failure, another set of prompts to maintain, and another source of unexpected behavior.
Multi-agent systems use 3-10x more tokens than single-agent approaches due to:
- Duplicating context across agents
- Coordination messages between agents
- Summarizing results for handoffs
Start with a Single Agent
A well-designed single agent with appropriate tools can accomplish far more than expected. Use single agent when:
- Tasks are sequential and context-dependent
- Tool count is under 15-20
- No clear benefit from parallelization
Three Cases Where Multi-Agent Excels
- Context pollution - Subtasks generate >1000 tokens but most info is irrelevant to main task
- Parallelization - Tasks can run independently and explore larger search space
- Specialization - Different tasks need different tools, prompts, or domain expertise
Decision Framework
Context Protection Pattern
Use when subtasks generate large context but only summary is needed for main task.
Example: Customer Support
class OrderLookupAgent:
def lookup_order(self, order_id: str) -> dict:
messages = [{"role": "user", "content": f"Get essential details for order {order_id}"}]
response = client.messages.create(
model="claude-sonnet-4-5", max_tokens=1024,
messages=messages, tools=[get_order_details_tool]
)
return extract_summary(response) # Returns 50-100 tokens, not 2000+
class SupportAgent:
def handle_issue(self, user_message: str):
if needs_order_info(user_message):
order_id = extract_order_id(user_message)
order_summary = OrderLookupAgent().lookup_order(order_id)
context = f"Order {order_id}: {order_summary['status']}, purchased {order_summary['date']}"
# Main agent gets clean context
Best when:
- Subtask generates >1000 tokens, most irrelevant
- Subtast is well-defined with clear extraction criteria
- Lookup/retrieval operations need filtering before use
Parallelization Pattern
Use when exploring larger search space or independent research facets.
import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic()
async def research_topic(query: str) -> dict:
facets = await lead_agent.decompose_query(query)
tasks = [research_subagent(facet) for facet in facets]
results = await asyncio.gather(*tasks)
return await lead_agent.synthesize(results)
async def research_subagent(facet: str) -> dict:
messages = [{"role": "user", "content": f"Research: {facet}"}]
response = await client.messages.create(
model="claude-sonnet-4-5", max_tokens=4096,
messages=messages, tools=[web_search, read_document]
)
return extract_findings(response)
Benefit: Thoroughness, not speed. Covers more ground at higher token cost.
Specialization Patterns
Tool Set Specialization
Split by domain when agent has 20+ tools, shows domain confusion, or degraded performance.
Signs you need specialization:
- Quantity: 20+ tools
- Domain confusion: Tools span unrelated domains
- Degraded performance: New tools hurt existing tasks
System Prompt Specialization
Different tasks require conflicting behavioral modes:
- Customer support: empathetic, patient
- Code review: precise, critical
- Compliance: rigid rule-following
- Brainstorming: creative flexibility
Domain Expertise Specialization
Deep domain context that would overwhelm a generalist:
- Legal analysis: case law, regulatory frameworks
- Medical research: clinical trial methodology
Multi-Platform Integration Example
class CRMAgent:
system_prompt = """You are a CRM specialist. You manage contacts,
opportunities, and account records. Always verify record ownership
before updates and maintain data integrity across related records."""
tools = [crm_get_contacts, crm_create_opportunity] # 8-10 CRM tools
class MarketingAgent:
system_prompt = """You are a marketing automation specialist. You
manage campaigns, lead scoring, and email sequences."""
tools = [marketing_get_campaigns, marketing_create_lead] # 8-10 tools
class OrchestratorAgent:
def execute(self, user_request: str):
response = client.messages.create(
model="claude-sonnet-4-5", max_tokens=1024,
system="""Route to appropriate specialist:
- CRM: Contacts, opportunities, accounts, sales pipeline
- Marketing: Campaigns, lead nurturing, email sequences""",
messages=[{"role": "user", "content": user_request}],
tools=[delegate_to_crm, delegate_to_marketing]
)
return response
Context-Centric Decomposition
Problem-centric (counterproductive): Split by work type (writer, tester, reviewer) - creates coordination overhead, context loss at handoffs.
Context-centric (effective): Agent handling a feature also handles its tests - already has necessary context.
Effective Boundaries
- Independent research paths
- Separate components with clean API contracts
- Blackbox verification
Problematic Boundaries
- Sequential phases of same work
- Tightly coupled components
- Work requiring shared state
Verification Subagent Pattern
Dedicated agent for testing/validating main agent's work. Succeeds because verification requires minimal context transfer.
class CodingAgent:
def implement_feature(self, requirements: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-5", max_tokens=4096,
messages=[{"role": "user", "content": f"Implement: {requirements}"}],
tools=[read_file, write_file, list_directory]
)
return {"code": response.content, "files_changed": extract_files(response)}
class VerificationAgent:
def verify_implementation(self, requirements: str, files_changed: list) -> dict:
messages = [{"role": "user", "content": f"""
Requirements: {requirements}
Files changed: {files_changed}
Run the complete test suite and verify:
1. All existing tests pass
2. New functionality works as specified
3. No obvious errors or security issues
You MUST run: pytest --verbose
Only mark as PASSED if ALL tests pass with no failures.
"""}]
response = client.messages.create(
model="claude-sonnet-4-5", max_tokens=4096,
messages=messages, tools=[run_tests, execute_code, read_file]
)
return {"passed": extract_pass_fail(response), "issues": extract_issues(response)}
Mitigate "Early Victory Problem"
Verifier marks passing without thorough testing. Prevention:
- Concrete criteria: "Run full test suite" not "make sure it works"
- Comprehensive checks: Test multiple scenarios and edge cases
- Negative tests: Confirm inputs that should fail do fail
- Explicit instructions: "You MUST run the complete test suite"
Moving Forward Checklist
Before adding multi-agent complexity:
- Genuine constraints exist (context limits, parallelization, specialization need)
- Decomposition follows context, not problem type
- Clear verification points where subagents can validate
Start with simplest approach that works. Add complexity only when evidence supports it.