Super Router (LangGraph Edition)

Intelligent task decomposition and model routing using LangGraph StateGraph. Automatically routes subtasks between PRO (heavy reasoning) and FLASH (fast) models based on structured complexity assessment.

When to Use This Skill

Use super-router when you need:

Intelligent model routing — automatically choose between heavy (PRO) and fast (FLASH) models per subtask
Task decomposition — break complex tasks into structured subtasks with independent routing
Cost optimization — use fast models for simple work, heavy models only when needed
Configurable models — use deterministic defaults, with environment-variable overrides for each role
Failure escalation — FLASH retry on infra failures, escalate to PRO on capability failures
Audit trail — full logging of planned vs actual routes, retries, and failure classifications

Not needed for: Simple single-turn tasks, tasks where you already know which model to use, or when you want manual control over every routing decision.

Core Architecture (LangGraph StateGraph)

Node	Function
Planner	Receives original task, calls local Ollama planner model to generate ordered subtask array
Judge	Scores each subtask on 5 dimensions: `reasoning_depth`, `code_change_scope`, `ambiguity`, `risk`, `io_heaviness`; combines with thresholds + confidence to decide PRO/FLASH
Dispatcher	Reads `RouterState.current_step`, routes via conditional edge to pro_executor or flash_executor
PRO Executor	Heavy reasoning model (default: Gemini CLI preview model; override via `ROUTER_PRO_MODEL`)
FLASH Executor	Fast model with review/retry logic (default: Gemini CLI preview model; override via `ROUTER_FLASH_MODEL`)
FLASH Review	Validates output quality; distinguishes infra failures (timeout, network) from capability failures; retries FLASH or escalates to PRO
Metadata Extractor	Extracts 'Technical Gold' (atomic high-precision facts) from step output to prevent finalizer timeouts and loss of detail
Recorder/Finalizer	Logs every step; compiles final report using a hybrid of Technical Gold and full audit trails; supports FLASH→PRO→deterministic fallback chain

Installation

# Required: LangGraph + Ollama
pip install langgraph

# Ensure Ollama is running locally
ollama serve

# Pull recommended models if you use Ollama-backed roles
ollama pull gemma4:26b     # Planner or PRO executor (high quality, slow)
ollama pull llama3.1:8b    # Judge (fast scoring, recommended)
ollama pull qwen3         # PRO executor
ollama pull qwen2.5:7b    # FLASH executor

Note: If you prefer gemma4:26b as the Planner, keep it there. For speed, the Judge should usually be llama3.1:8b or another 7B-14B model:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=llama3.1:8b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b

If you intentionally want an all-gemma4:26b Planner/Judge/PRO setup, use longer timeouts and serialized graph execution:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=gemma4:26b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
export ROUTER_JUDGE_TIMEOUT=600
export ROUTER_MAX_CONCURRENCY=1

Security Boundaries

The router only consumes task text, model names, and documented ROUTER_* settings.
It has no install hook, background persistence, arbitrary local file scanning, or destructive file operations.
Ollama traffic is local by default. Remote ROUTER_OLLAMA_URL values are refused unless ROUTER_ALLOW_REMOTE_OLLAMA=1 is set.
Gemini CLI execution is restricted to an executable named gemini and receives only a minimal allowlisted environment.
Provider prompts and outputs may leave the machine when using Gemini CLI or an explicitly trusted remote Ollama endpoint. Use local Ollama for sensitive work.

Usage

Basic Usage (via exec)

When user says "走 super-router", "use super-router", or asks for router analysis:

# Direct execution with task as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py '分析 K8s YAML 错误并重写配置'")

With Streaming (Node-Level Progress)

terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py --stream 'Your complex task'")

Via Environment Variable (Agent Compatibility)

For agents that struggle with non-ASCII arguments:

# Normalize task to short ASCII English, then pass as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py 'Analyze K8s YAML errors and fix'")

# Or via env var (if agent supports it)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py", 
         env={"ROUTER_TASK": "Your complex task description"})

Handling Long-Running Execution

If exec returns "Command still running":

# Continue polling with process tool
process(action="poll", session_id="<session_id_from_exec>")

# Wait for completion
process(action="wait", session_id="<session_id_from_exec>", timeout=300)

Important: Once process shows completion, your next assistant message MUST start with Router result: or Router failed: and include at least one real detail from the output (e.g., "Planner fallback", "Ollama timed out", "BTC"). Never reply with just ---, punctuation, or empty lines.

Environment Variables

Variable	Purpose	Default
`ROUTER_PLANNER_MODEL`	Task decomposition model	`gemma4:26b`
`ROUTER_JUDGE_MODEL`	Complexity scoring model	`llama3.1:8b`
`ROUTER_PRO_MODEL`	Heavy reasoning executor	`google-gemini-cli/gemini-3-pro-preview`
`ROUTER_FLASH_MODEL`	Fast executor	`google-gemini-cli/flash`
`ROUTER_PRO_FALLBACK_MODELS`	Comma-separated PRO fallback list	None
`ROUTER_FLASH_FALLBACK_MODELS`	Comma-separated FLASH fallback list	None
`ROUTER_FLASH_RETRY_BUDGET`	Max FLASH retries before escalation	1
`ROUTER_RECURSION_LIMIT`	Python recursion limit	128
`ROUTER_JUDGE_TIMEOUT`	Timeout for Judge node LLM calls (seconds)	300 (up to 6000 for extremely complex tasks with large models)
`ROUTER_MAX_CONCURRENCY`	LangGraph max node concurrency; set `1` for local 26B+ Judge models	Auto (`1` for large Judge models)
`ROUTER_GEMINI_CLI`	Path to Gemini CLI (if using instead of Ollama)	`/opt/homebrew/bin/gemini`
`ROUTER_OLLAMA_URL`	Ollama API endpoint	`http://localhost:11434/api/generate`
`ROUTER_ALLOW_REMOTE_OLLAMA`	Opt in to non-local Ollama endpoints after trusting them	Off
`ROUTER_FINALIZER_TIMEOUT`	Timeout for the final reporting synthesis (seconds). Essential to set high (e.g., 600) for complex tasks to avoid timeouts during context assembly.	600
`ROUTER_DEBUG`	Print raw planner/judge/Ollama diagnostic snippets	Off

For large models (20B+ like gemma4:26b):

Prefer ROUTER_PLANNER_MODEL=gemma4:26b with ROUTER_JUDGE_MODEL=llama3.1:8b
If using ROUTER_JUDGE_MODEL=gemma4:26b, set ROUTER_JUDGE_TIMEOUT=600 and keep ROUTER_MAX_CONCURRENCY=1
Planner timeout is auto-set to 300s for large models
Expect 2-5 minute wait times per LLM call
Model warmup adds ~30-60s upfront but prevents timeouts.
Crucial: A 60s terminal timeout can still kill the run even if internal router timeouts are higher. Use --stream, process polling via process(action='poll'), and a longer terminal/process wait timeout for large Planner/Judge runs.

Complexity Routing Rules

5-Dimension Scoring

The Judge scores each subtask on:

reasoning_depth (1-10): How much logical inference is needed?
code_change_scope (1-10): How many files/lines of code to modify?
ambiguity (1-10): How unclear is the task specification?
risk (1-10): What's the impact of getting this wrong?
io_heaviness (1-10): How much reading/writing vs. thinking?

Routing Thresholds

Condition	Route
`complexity_score >= 5`	PRO
`complexity_score <= 2`	FLASH
Summary-like task (no deep work)	FLASH
High-risk incident diagnosis	PRO
High-risk evidence gathering	PRO
High-risk decision/rollback evaluation	PRO
Boundary case + low confidence	PRO (safe default)

Contextual Score Biases

The router applies automatic adjustments:

High-risk context (production, billing, security): boosts reasoning_depth, risk, ambiguity
Evidence gathering in incident: keeps on PRO (not mere IO)
Communication/summary subtasks: routed to FLASH unless deep work is also required

FLASH Review & Escalation Logic

When FLASH execution fails or produces questionable output:

Classify failure type:
- infra_transient: timeout, network, rate limit, service unavailable
- capability_quality: "need more info", empty output, too short, repeated task
Decision:
- Infra failure → Retry FLASH (up to ROUTER_FLASH_RETRY_BUDGET)
- Capability failure → Escalate to PRO immediately
- Unknown → Retry once, then escalate
Post-execution verification:
- Empty output → escalate
- Output < 48 chars (non-summary) → escalate
- Output explicitly says "can't complete" → escalate
- Output just repeats task description → escalate

Finalizer Fallback Chain

Final report generation follows:

FLASH finalizer → (if fails) → PRO finalizer → (if fails) → Deterministic template

Output Structure

Output Structure: The router returns a JSON-serializable state. When summarizing these results in reports or documentation, always use ASCII/Terminal-style arrows (e.g., '-->', '->') rather than mathematical arrows (e.g., '→', '$\rightarrow$') for all diagrams and flow representations. This is a high-priority stylistic requirement.

{
  "task": "original task string",
  "planner_model": "model name used for planning",
  "judge_model": "model name used for complexity scoring",
  "pro_model": "primary PRO model",
  "flash_model": "primary FLASH model",
  "planned_subtasks": [{"desc": "..."}],
  "subtasks": [
    {
      "desc": "...",
      "model": "PRO|FLASH",
      "assessment": {
        "scores": {"reasoning_depth": 5, "code_change_scope": 3, "ambiguity": 2, "risk": 4, "io_heaviness": 1},
        "complexity_score": 15,
        "suggested_route": "PRO",
        "final_route": "PRO",
        "confidence": 0.85,
        "reason": "...",
        "judge_source": "llm|heuristic"
      }
    }
  ],
  "results": [
    {
      "step": 1,
      "planned_route": "PRO",
      "route": "PRO",
      "model_name": "qwen3",
      "desc": "...",
      "output": "...",
      "status": "success|failed",
      "attempt_count": 1,
      "retry_count": 0,
      "escalated_from_flash": false,
      "used_provider_fallback": false,
      "flash_review": {"decision": "record", "failure_type": "none", "reason": "..."},
      "attempt_log": ["..."]
    }
  ],
  "final_report": "...",
  "finalizer_outcome": {
    "route": "FLASH|PRO|DETERMINISTIC",
    "model_name": "...",
    "status": "...",
    "used_provider_fallback": false,
    "reason": "...",
    "attempt_log": ["..."]
  }
}

Example Workflows

Example 1: K8s Incident Triage

router.py "生产环境 K8s Pod 频繁重启，分析日志找出根因，给出修复方案并整理给值班同事的简短行动摘要"

Expected routing:

"分析 Pod 重启日志，定位错误模式" → PRO (high-risk diagnosis)
"确定根因（资源不足/配置错误/依赖故障）" → PRO (high-risk decision)
"制定修复方案（YAML 调整/回滚/扩容）" → PRO (high-risk repair plan)
"整理给值班同事的简短行动摘要" → FLASH (communication/summary)

Example 2: Code Refactoring

router.py "Refactor auth module to use JWT, add unit tests, update docs"

Expected routing:

"Analyze current auth implementation" → PRO (deep inspection)
"Design JWT claims model" → PRO (design logic)
"Implement JWT encoding/decoding" → PRO (implementation)
"Add unit tests for JWT functions" → PRO (test logic)
"Update README with JWT usage examples" → FLASH (documentation)

Example 3: Simple Summary

router.py "Summarize the last 10 git commits"

Expected routing:

Single subtask → FLASH (summary-like, low complexity)

Maintenance

File	Purpose
`scripts/router.py`	Main LangGraph router script
`SKILL.md`	This documentation

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

Best fix when keeping a large Planner: keep ROUTER_PLANNER_MODEL=gemma4:26b, but set ROUTER_JUDGE_MODEL=llama3.1:8b.
All-gemma mode: set ROUTER_JUDGE_MODEL=gemma4:26b, ROUTER_JUDGE_TIMEOUT=600, and ROUTER_MAX_CONCURRENCY=1; expect much longer runs.
Use --stream and increase the terminal/process timeout if the Planner itself may take longer than 60s.
Set ROUTER_JUDGE_TIMEOUT=300 or higher only when intentionally using a 20B+ Judge.
Alternative: use Gemini CLI for planning: ROUTER_PLANNER_MODEL=google-gemini-cli/gemini-3-pro-preview.

"Planner timed out after 30s" (or 90s)

Model is too large or not loaded. Warmup helps but large models may still timeout.
Use --stream plus a longer terminal/process timeout, or choose a smaller planner model.
Check Ollama logs: ollama serve output for errors

"FLASH kept escalating to PRO"

Task may genuinely require heavy reasoning
Check if FLASH model is too small for your tasks
Try setting ROUTER_FLASH_MODEL to a larger model

"Gemini CLI AbortError or Auth Failures"

If gemini-cli returns AbortError or authentication errors in non-interactive sessions, this is often an infrastructure/API timeout or session issue.
Use --stream to monitor real-time progress and ensure ROUTER_JUDGE_TIMEOUT and terminal timeouts are sufficiently high to prevent external process termination.

"Planner produced only one subtask"

Task may be simple enough to not need decomposition
Planner model may be too small; try ROUTER_PLANNER_MODEL=gemma4:31b (if you have the patience for 90s+ waits)

Related Skills

dspy — Declarative LM programming with automatic prompt optimization (Python framework alternative)
subagent-driven-development — Task decomposition with OpenClaw-native delegation + two-stage review
llama-cpp — Run LLM inference locally (alternative to Ollama backend)

super-router

Safety Notice

Copy this and send it to your AI assistant to learn

Super Router (LangGraph Edition)

When to Use This Skill

Core Architecture (LangGraph StateGraph)

Installation

Security Boundaries

Usage

Basic Usage (via exec)

With Streaming (Node-Level Progress)

Via Environment Variable (Agent Compatibility)

Handling Long-Running Execution

Environment Variables

Complexity Routing Rules

5-Dimension Scoring

Routing Thresholds

Contextual Score Biases

FLASH Review & Escalation Logic

Finalizer Fallback Chain

Output Structure

Example Workflows

Example 1: K8s Incident Triage

Example 2: Code Refactoring

Example 3: Simple Summary

Maintenance

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

"Planner timed out after 30s" (or 90s)

"FLASH kept escalating to PRO"

"Gemini CLI AbortError or Auth Failures"

"Planner produced only one subtask"

Related Skills

See Also

Source Transparency

Related Skills

Bilibili Subtitles

飞书会议室智能预订

Super Marketing Pro

OpenClaw Token Saver