LLM Router
Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.
When to Use
✅ Use for:
-
Deciding which model to call for a specific task
-
Assigning models to DAG nodes in agent workflows
-
Optimizing LLM API costs across a system
-
Building cascading try-cheap-first patterns
❌ NOT for:
-
Prompt engineering (use prompt-engineer )
-
Model fine-tuning or training
-
Comparing model architectures (academic research)
Routing Decision Tree
flowchart TD
A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini ($0.001)"]
A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o ($0.01)"]
A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
T1 --> Q1{Quality sufficient?} Q1 -->|Yes| Done1[Use cheap model] Q1 -->|No| T2
T2 --> Q2{Quality sufficient?} Q2 -->|Yes| Done2[Use balanced model] Q2 -->|No| T3
Tier Assignment Table
Task Type Tier Models Cost/Call Why This Tier
Classify input type 1 Haiku, GPT-4o-mini ~$0.001 Deterministic categorization
Validate schema/format 1 Haiku, GPT-4o-mini ~$0.001 Mechanical checking
Format output / template 1 Haiku, GPT-4o-mini ~$0.001 Structured transformation
Extract structured data 1 Haiku, GPT-4o-mini ~$0.001 Pattern matching
Summarize text 1-2 Haiku → Sonnet ~$0.001-0.01 Short summaries: Haiku; nuanced: Sonnet
Write content/docs 2 Sonnet, GPT-4o ~$0.01 Creative quality matters
Implement code 2 Sonnet, GPT-4o ~$0.01 Correctness + style
Review code/diffs 2 Sonnet, GPT-4o ~$0.01 Needs judgment, not just pattern matching
Research synthesis 2 Sonnet, GPT-4o ~$0.01 Multi-source reasoning
Decompose ambiguous problem 3 Opus, o1 ~$0.10 Requires deep understanding
Design architecture 3 Opus, o1 ~$0.10 Complex system reasoning
Judge output quality 3 Opus, o1 ~$0.10 Meta-reasoning about quality
Plan multi-step strategy 3 Opus, o1 ~$0.10 Long-horizon planning
Three Routing Strategies
Strategy 1: Static Tier Assignment (Start Here)
Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.
nodes:
- id: classify model: claude-haiku-4-5 # Tier 1: $0.001
- id: implement model: claude-sonnet-4-5 # Tier 2: $0.01
- id: evaluate model: claude-opus-4-5 # Tier 3: $0.10
Strategy 2: Cascading (Try Cheap First)
Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.
- Execute with Tier 1 model
- Quick quality check (also Tier 1 — costs ~$0.001)
- If quality ≥ threshold → done
- If quality < threshold → re-execute with Tier 2
Best for nodes where you're genuinely unsure which tier is needed.
Strategy 3: Adaptive (Learn from History)
Record success/failure per task type per model. Over time, the router learns:
-
"Classification nodes always succeed on Haiku" → stay cheap
-
"Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
-
"Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus
Gets 75-85% savings after ~100 executions of training data.
Provider Selection
Once model tier is chosen, select the provider:
Model Class Provider Options Selection Criteria
Haiku-class Anthropic, AWS Bedrock Latency, regional availability
Sonnet-class Anthropic, AWS Bedrock, GCP Vertex Cost, rate limits
Opus-class Anthropic Only provider
GPT-4o-class OpenAI, Azure OpenAI Rate limits, compliance
Open-source Ollama (local), Together.ai, Fireworks Cost ($0), latency, GPU availability
Cost Impact Example
10-node DAG, "refactor a codebase":
Strategy Mix Cost Savings
All Opus 10× $0.10 $1.00 —
All Sonnet 10× $0.01 $0.10 90%
Static tiers 4× Haiku + 4× Sonnet + 2× Opus $0.24 76%
Cascading 6× Haiku + 3× Sonnet + 1× Opus $0.14 86%
Adaptive (trained) Dynamic ~$0.08 92%
Anti-Patterns
Always Use the Best Model
Wrong: Route everything to Opus/o1 "for quality." Reality: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.
Always Use the Cheapest Model
Wrong: Route everything to Haiku "for cost." Reality: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.
Ignoring Latency
Wrong: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku. Reality: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.
No Feedback Loop
Wrong: Setting model tiers once and never adjusting. Reality: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.