super-router

LangGraph-based intelligent task router that splits work between PRO (heavy reasoning) and FLASH (fast) models using 5-dimension complexity scoring, configurable model defaults, and FLASH→PRO escalation.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "super-router" with this command: npx skills add fanyadan/super-router

Super Router (LangGraph Edition)

Intelligent task decomposition and model routing using LangGraph StateGraph. Automatically routes subtasks between PRO (heavy reasoning) and FLASH (fast) models based on structured complexity assessment.

When to Use This Skill

Use super-router when you need:

  • Intelligent model routing — automatically choose between heavy (PRO) and fast (FLASH) models per subtask
  • Task decomposition — break complex tasks into structured subtasks with independent routing
  • Cost optimization — use fast models for simple work, heavy models only when needed
  • Configurable models — use deterministic defaults, with environment-variable overrides for each role
  • Failure escalation — FLASH retry on infra failures, escalate to PRO on capability failures
  • Audit trail — full logging of planned vs actual routes, retries, and failure classifications

Not needed for: Simple single-turn tasks, tasks where you already know which model to use, or when you want manual control over every routing decision.

Core Architecture (LangGraph StateGraph)

NodeFunction
PlannerReceives original task, calls local Ollama planner model to generate ordered subtask array
JudgeScores each subtask on 5 dimensions: reasoning_depth, code_change_scope, ambiguity, risk, io_heaviness; combines with thresholds + confidence to decide PRO/FLASH
DispatcherReads RouterState.current_step, routes via conditional edge to pro_executor or flash_executor
PRO ExecutorHeavy reasoning model (default: Gemini CLI preview model; override via ROUTER_PRO_MODEL)
FLASH ExecutorFast model with review/retry logic (default: Gemini CLI preview model; override via ROUTER_FLASH_MODEL)
FLASH ReviewValidates output quality; distinguishes infra failures (timeout, network) from capability failures; retries FLASH or escalates to PRO
Metadata ExtractorExtracts 'Technical Gold' (atomic high-precision facts) from step output to prevent finalizer timeouts and loss of detail
Recorder/FinalizerLogs every step; compiles final report using a hybrid of Technical Gold and full audit trails; supports FLASH→PRO→deterministic fallback chain

Installation

# Required: LangGraph + Ollama
pip install langgraph

# Ensure Ollama is running locally
ollama serve

# Pull recommended models if you use Ollama-backed roles
ollama pull gemma4:26b     # Planner or PRO executor (high quality, slow)
ollama pull llama3.1:8b    # Judge (fast scoring, recommended)
ollama pull qwen3         # PRO executor
ollama pull qwen2.5:7b    # FLASH executor

Note: If you prefer gemma4:26b as the Planner, keep it there. For speed, the Judge should usually be llama3.1:8b or another 7B-14B model:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=llama3.1:8b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b

If you intentionally want an all-gemma4:26b Planner/Judge/PRO setup, use longer timeouts and serialized graph execution:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=gemma4:26b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
export ROUTER_JUDGE_TIMEOUT=600
export ROUTER_MAX_CONCURRENCY=1

Security Boundaries

  • The router only consumes task text, model names, and documented ROUTER_* settings.
  • It has no install hook, background persistence, arbitrary local file scanning, or destructive file operations.
  • Ollama traffic is local by default. Remote ROUTER_OLLAMA_URL values are refused unless ROUTER_ALLOW_REMOTE_OLLAMA=1 is set.
  • Gemini CLI execution is restricted to an executable named gemini and receives only a minimal allowlisted environment.
  • Provider prompts and outputs may leave the machine when using Gemini CLI or an explicitly trusted remote Ollama endpoint. Use local Ollama for sensitive work.

Usage

Basic Usage (via exec)

When user says "走 super-router", "use super-router", or asks for router analysis:

# Direct execution with task as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py '分析 K8s YAML 错误并重写配置'")

With Streaming (Node-Level Progress)

terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py --stream 'Your complex task'")

Via Environment Variable (Agent Compatibility)

For agents that struggle with non-ASCII arguments:

# Normalize task to short ASCII English, then pass as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py 'Analyze K8s YAML errors and fix'")

# Or via env var (if agent supports it)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py", 
         env={"ROUTER_TASK": "Your complex task description"})

Handling Long-Running Execution

If exec returns "Command still running":

# Continue polling with process tool
process(action="poll", session_id="<session_id_from_exec>")

# Wait for completion
process(action="wait", session_id="<session_id_from_exec>", timeout=300)

Important: Once process shows completion, your next assistant message MUST start with Router result: or Router failed: and include at least one real detail from the output (e.g., "Planner fallback", "Ollama timed out", "BTC"). Never reply with just ---, punctuation, or empty lines.

Environment Variables

VariablePurposeDefault
ROUTER_PLANNER_MODELTask decomposition modelgemma4:26b
ROUTER_JUDGE_MODELComplexity scoring modelllama3.1:8b
ROUTER_PRO_MODELHeavy reasoning executorgoogle-gemini-cli/gemini-3-pro-preview
ROUTER_FLASH_MODELFast executorgoogle-gemini-cli/flash
ROUTER_PRO_FALLBACK_MODELSComma-separated PRO fallback listNone
ROUTER_FLASH_FALLBACK_MODELSComma-separated FLASH fallback listNone
ROUTER_FLASH_RETRY_BUDGETMax FLASH retries before escalation1
ROUTER_RECURSION_LIMITPython recursion limit128
ROUTER_JUDGE_TIMEOUTTimeout for Judge node LLM calls (seconds)300 (up to 6000 for extremely complex tasks with large models)
ROUTER_MAX_CONCURRENCYLangGraph max node concurrency; set 1 for local 26B+ Judge modelsAuto (1 for large Judge models)
ROUTER_GEMINI_CLIPath to Gemini CLI (if using instead of Ollama)/opt/homebrew/bin/gemini
ROUTER_OLLAMA_URLOllama API endpointhttp://localhost:11434/api/generate
ROUTER_ALLOW_REMOTE_OLLAMAOpt in to non-local Ollama endpoints after trusting themOff
ROUTER_FINALIZER_TIMEOUTTimeout for the final reporting synthesis (seconds). Essential to set high (e.g., 600) for complex tasks to avoid timeouts during context assembly.600
ROUTER_DEBUGPrint raw planner/judge/Ollama diagnostic snippetsOff

For large models (20B+ like gemma4:26b):

  • Prefer ROUTER_PLANNER_MODEL=gemma4:26b with ROUTER_JUDGE_MODEL=llama3.1:8b
  • If using ROUTER_JUDGE_MODEL=gemma4:26b, set ROUTER_JUDGE_TIMEOUT=600 and keep ROUTER_MAX_CONCURRENCY=1
  • Planner timeout is auto-set to 300s for large models
  • Expect 2-5 minute wait times per LLM call
  • Model warmup adds ~30-60s upfront but prevents timeouts.
  • Crucial: A 60s terminal timeout can still kill the run even if internal router timeouts are higher. Use --stream, process polling via process(action='poll'), and a longer terminal/process wait timeout for large Planner/Judge runs.

Complexity Routing Rules

5-Dimension Scoring

The Judge scores each subtask on:

  1. reasoning_depth (1-10): How much logical inference is needed?
  2. code_change_scope (1-10): How many files/lines of code to modify?
  3. ambiguity (1-10): How unclear is the task specification?
  4. risk (1-10): What's the impact of getting this wrong?
  5. io_heaviness (1-10): How much reading/writing vs. thinking?

Routing Thresholds

ConditionRoute
complexity_score >= 5PRO
complexity_score <= 2FLASH
Summary-like task (no deep work)FLASH
High-risk incident diagnosisPRO
High-risk evidence gatheringPRO
High-risk decision/rollback evaluationPRO
Boundary case + low confidencePRO (safe default)

Contextual Score Biases

The router applies automatic adjustments:

  • High-risk context (production, billing, security): boosts reasoning_depth, risk, ambiguity
  • Evidence gathering in incident: keeps on PRO (not mere IO)
  • Communication/summary subtasks: routed to FLASH unless deep work is also required

FLASH Review & Escalation Logic

When FLASH execution fails or produces questionable output:

  1. Classify failure type:

    • infra_transient: timeout, network, rate limit, service unavailable
    • capability_quality: "need more info", empty output, too short, repeated task
  2. Decision:

    • Infra failure → Retry FLASH (up to ROUTER_FLASH_RETRY_BUDGET)
    • Capability failure → Escalate to PRO immediately
    • Unknown → Retry once, then escalate
  3. Post-execution verification:

    • Empty output → escalate
    • Output < 48 chars (non-summary) → escalate
    • Output explicitly says "can't complete" → escalate
    • Output just repeats task description → escalate

Finalizer Fallback Chain

Final report generation follows:

FLASH finalizer → (if fails) → PRO finalizer → (if fails) → Deterministic template

Output Structure

  • Output Structure: The router returns a JSON-serializable state. When summarizing these results in reports or documentation, always use ASCII/Terminal-style arrows (e.g., '-->', '->') rather than mathematical arrows (e.g., '→', '$\rightarrow$') for all diagrams and flow representations. This is a high-priority stylistic requirement.
{
  "task": "original task string",
  "planner_model": "model name used for planning",
  "judge_model": "model name used for complexity scoring",
  "pro_model": "primary PRO model",
  "flash_model": "primary FLASH model",
  "planned_subtasks": [{"desc": "..."}],
  "subtasks": [
    {
      "desc": "...",
      "model": "PRO|FLASH",
      "assessment": {
        "scores": {"reasoning_depth": 5, "code_change_scope": 3, "ambiguity": 2, "risk": 4, "io_heaviness": 1},
        "complexity_score": 15,
        "suggested_route": "PRO",
        "final_route": "PRO",
        "confidence": 0.85,
        "reason": "...",
        "judge_source": "llm|heuristic"
      }
    }
  ],
  "results": [
    {
      "step": 1,
      "planned_route": "PRO",
      "route": "PRO",
      "model_name": "qwen3",
      "desc": "...",
      "output": "...",
      "status": "success|failed",
      "attempt_count": 1,
      "retry_count": 0,
      "escalated_from_flash": false,
      "used_provider_fallback": false,
      "flash_review": {"decision": "record", "failure_type": "none", "reason": "..."},
      "attempt_log": ["..."]
    }
  ],
  "final_report": "...",
  "finalizer_outcome": {
    "route": "FLASH|PRO|DETERMINISTIC",
    "model_name": "...",
    "status": "...",
    "used_provider_fallback": false,
    "reason": "...",
    "attempt_log": ["..."]
  }
}

Example Workflows

Example 1: K8s Incident Triage

router.py "生产环境 K8s Pod 频繁重启,分析日志找出根因,给出修复方案并整理给值班同事的简短行动摘要"

Expected routing:

  1. "分析 Pod 重启日志,定位错误模式" → PRO (high-risk diagnosis)
  2. "确定根因(资源不足/配置错误/依赖故障)" → PRO (high-risk decision)
  3. "制定修复方案(YAML 调整/回滚/扩容)" → PRO (high-risk repair plan)
  4. "整理给值班同事的简短行动摘要" → FLASH (communication/summary)

Example 2: Code Refactoring

router.py "Refactor auth module to use JWT, add unit tests, update docs"

Expected routing:

  1. "Analyze current auth implementation" → PRO (deep inspection)
  2. "Design JWT claims model" → PRO (design logic)
  3. "Implement JWT encoding/decoding" → PRO (implementation)
  4. "Add unit tests for JWT functions" → PRO (test logic)
  5. "Update README with JWT usage examples" → FLASH (documentation)

Example 3: Simple Summary

router.py "Summarize the last 10 git commits"

Expected routing:

  • Single subtask → FLASH (summary-like, low complexity)

Maintenance

FilePurpose
scripts/router.pyMain LangGraph router script
SKILL.mdThis documentation

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

  • Best fix when keeping a large Planner: keep ROUTER_PLANNER_MODEL=gemma4:26b, but set ROUTER_JUDGE_MODEL=llama3.1:8b.
  • All-gemma mode: set ROUTER_JUDGE_MODEL=gemma4:26b, ROUTER_JUDGE_TIMEOUT=600, and ROUTER_MAX_CONCURRENCY=1; expect much longer runs.
  • Use --stream and increase the terminal/process timeout if the Planner itself may take longer than 60s.
  • Set ROUTER_JUDGE_TIMEOUT=300 or higher only when intentionally using a 20B+ Judge.
  • Alternative: use Gemini CLI for planning: ROUTER_PLANNER_MODEL=google-gemini-cli/gemini-3-pro-preview.

"Planner timed out after 30s" (or 90s)

  • Model is too large or not loaded. Warmup helps but large models may still timeout.
  • Use --stream plus a longer terminal/process timeout, or choose a smaller planner model.
  • Check Ollama logs: ollama serve output for errors

"FLASH kept escalating to PRO"

  • Task may genuinely require heavy reasoning
  • Check if FLASH model is too small for your tasks
  • Try setting ROUTER_FLASH_MODEL to a larger model

"Gemini CLI AbortError or Auth Failures"

  • If gemini-cli returns AbortError or authentication errors in non-interactive sessions, this is often an infrastructure/API timeout or session issue.
  • Use --stream to monitor real-time progress and ensure ROUTER_JUDGE_TIMEOUT and terminal timeouts are sufficiently high to prevent external process termination.

"Planner produced only one subtask"

  • Task may be simple enough to not need decomposition
  • Planner model may be too small; try ROUTER_PLANNER_MODEL=gemma4:31b (if you have the patience for 90s+ waits)

Related Skills

  • dspy — Declarative LM programming with automatic prompt optimization (Python framework alternative)
  • subagent-driven-development — Task decomposition with OpenClaw-native delegation + two-stage review
  • llama-cpp — Run LLM inference locally (alternative to Ollama backend)

See Also

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Bilibili Subtitles

使用 yt-dlp 从哔哩哔哩公开视频提取已有字幕或自动字幕(不下载整段视频)。当用户提到 B 站、bilibili、BV 号、视频字幕、拉字幕、做摘要、根据视频内容回答问题时使用。v1 仅支持平台已提供字幕轨道的视频;无字幕视频需换源或后续用 Whisper 等方案。

Registry SourceRecently Updated
2270Profile unavailable
General

飞书会议室智能预订

飞书会议室查询与预订。当用户提到"查会议室"、"订会议室"、"空闲会议室"、"预订会议室"、"开会"、"找个会议室"、"F4会议室"、"紫金会议室"、"哪个会议室有空"、或者创建会议时需要自动匹配空闲会议室时,必须使用此 skill。也适用于用户要求创建日程并指定楼栋/区域时自动完成会议室预订的场景。也适用于用户...

Registry SourceRecently Updated
850Profile unavailable
General

Super Marketing Pro

Full-stack B2B marketing execution skill equivalent to a 10-person agency team. Use for: building ICP and brand messaging, generating multi-platform content...

Registry SourceRecently Updated
1920Profile unavailable
General

OpenClaw Token Saver

OpenClaw Token 节省指南。提供5大类20+种减少Token消耗的方法,包括上下文瘦身、工具优化、缓存复用、模型控制和本地替代方案。当Token使用超过阈值时自动触发优化建议。

Registry SourceRecently Updated
2310Profile unavailable