llm-router

LLM Router

Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.

When to Use

✅ Use for:

Deciding which model to call for a specific task
Assigning models to DAG nodes in agent workflows
Optimizing LLM API costs across a system
Building cascading try-cheap-first patterns

❌ NOT for:

Prompt engineering (use prompt-engineer )
Model fine-tuning or training
Comparing model architectures (academic research)

Routing Decision Tree

flowchart TD A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~~$0.001)"] A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~~$0.01)"] A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]

T1 --> Q1{Quality sufficient?} Q1 -->|Yes| Done1[Use cheap model] Q1 -->|No| T2

T2 --> Q2{Quality sufficient?} Q2 -->|Yes| Done2[Use balanced model] Q2 -->|No| T3

Tier Assignment Table

Task Type Tier Models Cost/Call Why This Tier

Classify input type 1 Haiku, GPT-4o-mini ~$0.001 Deterministic categorization

Validate schema/format 1 Haiku, GPT-4o-mini ~$0.001 Mechanical checking

Format output / template 1 Haiku, GPT-4o-mini ~$0.001 Structured transformation

Extract structured data 1 Haiku, GPT-4o-mini ~$0.001 Pattern matching

Summarize text 1-2 Haiku → Sonnet ~$0.001-0.01 Short summaries: Haiku; nuanced: Sonnet

Write content/docs 2 Sonnet, GPT-4o ~$0.01 Creative quality matters

Implement code 2 Sonnet, GPT-4o ~$0.01 Correctness + style

Review code/diffs 2 Sonnet, GPT-4o ~$0.01 Needs judgment, not just pattern matching

Research synthesis 2 Sonnet, GPT-4o ~$0.01 Multi-source reasoning

Decompose ambiguous problem 3 Opus, o1 ~$0.10 Requires deep understanding

Design architecture 3 Opus, o1 ~$0.10 Complex system reasoning

Judge output quality 3 Opus, o1 ~$0.10 Meta-reasoning about quality

Plan multi-step strategy 3 Opus, o1 ~$0.10 Long-horizon planning

Three Routing Strategies

Strategy 1: Static Tier Assignment (Start Here)

Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.

nodes:

id: classify model: claude-haiku-4-5 # Tier 1: $0.001
id: implement model: claude-sonnet-4-5 # Tier 2: $0.01
id: evaluate model: claude-opus-4-5 # Tier 3: $0.10

Strategy 2: Cascading (Try Cheap First)

Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.

Execute with Tier 1 model
Quick quality check (also Tier 1 — costs ~$0.001)
If quality ≥ threshold → done
If quality < threshold → re-execute with Tier 2

Best for nodes where you're genuinely unsure which tier is needed.

Strategy 3: Adaptive (Learn from History)

Record success/failure per task type per model. Over time, the router learns:

"Classification nodes always succeed on Haiku" → stay cheap
"Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
"Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus

Gets 75-85% savings after ~100 executions of training data.

Provider Selection

Once model tier is chosen, select the provider:

Model Class Provider Options Selection Criteria

Haiku-class Anthropic, AWS Bedrock Latency, regional availability

Sonnet-class Anthropic, AWS Bedrock, GCP Vertex Cost, rate limits

Opus-class Anthropic Only provider

GPT-4o-class OpenAI, Azure OpenAI Rate limits, compliance

Open-source Ollama (local), Together.ai, Fireworks Cost ($0), latency, GPU availability

Cost Impact Example

10-node DAG, "refactor a codebase":

Strategy Mix Cost Savings

All Opus 10× $0.10 $1.00 —

All Sonnet 10× $0.01 $0.10 90%

Static tiers 4× Haiku + 4× Sonnet + 2× Opus $0.24 76%

Cascading 6× Haiku + 3× Sonnet + 1× Opus $0.14 86%

Adaptive (trained) Dynamic ~$0.08 92%

Anti-Patterns

Always Use the Best Model

Wrong: Route everything to Opus/o1 "for quality." Reality: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.

Always Use the Cheapest Model

Wrong: Route everything to Haiku "for cost." Reality: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.

Ignoring Latency

Wrong: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku. Reality: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.

No Feedback Loop

Wrong: Setting model tiers once and never adjusting. Reality: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

agent-creator

test-automation-expert

chatbot-analytics

dag-task-scheduler