langgraph-error-handling

Implement LangGraph error handling with current v1 patterns. Use when users need to classify failures, add RetryPolicy for transient issues, build LLM recovery loops with Command routing, add human-in-the-loop with interrupt()/resume, handle ToolNode errors, or choose a safe strategy between retry, recovery, and escalation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "langgraph-error-handling" with this command: npx skills add lubu-labs/langchain-agent-skills/lubu-labs-langchain-agent-skills-langgraph-error-handling

LangGraph Error Handling

Use This Skill For

  • Adding RetryPolicy to flaky nodes (API, DB, model/tool calls)
  • Designing LLM recovery loops (Command + error state + retry counters)
  • Adding human approval/escalation with interrupt() and resume
  • Handling prebuilt ToolNode failures
  • Debugging transactional failure behavior in parallel supersteps

Strategy Selection

Use this order:

  1. Transient/infrastructure issue (429, timeout, 5xx, temporary DB lock) -> RetryPolicy
  2. Recoverable by model/tool args correction -> store error in state and route back with Command
  3. Needs user approval or missing info -> interrupt() + resume
  4. Unknown/programming bug -> let it bubble up and debug
Error TypeOwnerPrimary Mechanism
TransientSystemRetryPolicy
LLM-recoverableLLMState update + Command(goto=...)
User-fixableHumaninterrupt() + Command(resume=...)
UnexpectedDeveloperRaise/log/debug

For full taxonomy, load references/error-types.md.

Minimal Patterns

1) Retry Transient Failures

from langgraph.types import RetryPolicy

builder.add_node(
    "call_api",
    call_api,
    retry_policy=RetryPolicy(max_attempts=3, initial_interval=1.0),
)
builder.addNode("callApi", callApi, {
  retryPolicy: { maxAttempts: 3, initialInterval: 1.0 },
});

Notes:

  • Python and JS default retry behavior differs by exception type.
  • Prefer targeted retry_on/retryOn for non-transient domains.

2) LLM Recovery Loop

Use MessagesState in Python for message state.

from typing import Literal
from typing_extensions import NotRequired
from langgraph.graph import MessagesState
from langgraph.types import Command

class State(MessagesState):
    error: NotRequired[str]
    retry_count: NotRequired[int]

def agent(state: State) -> Command[Literal["tool", "__end__"]]:
    if state.get("retry_count", 0) >= 3:
        return Command(goto="__end__")
    if state.get("error"):
        return Command(goto="tool")
    return Command(goto="tool")
import { StateGraph, Command, END } from "@langchain/langgraph";

// If a node returns Command in JS, add `ends` on addNode.
builder.addNode("agent", agentNode, { ends: ["tool", END] });

3) Human-In-The-Loop Escalation

from langgraph.types import interrupt, Command

def human_review(state):
    approved = interrupt({
        "question": "Proceed?",
        "payload": state["pending_action"],
    })
    return Command(goto="execute" if approved else "cancel")

# resume
graph.invoke(Command(resume=True), config={"configurable": {"thread_id": "t-1"}})
import { Command, interrupt } from "@langchain/langgraph";

const approved = interrupt({ question: "Proceed?" });
// later
await graph.invoke(new Command({ resume: true }), {
  configurable: { thread_id: "t-1" },
});

Requirements:

  • Compile with a checkpointer for interrupt flows.
  • Reuse the same thread_id on resume.

For deep HITL patterns, load references/human-escalation.md.

ToolNode Error Handling

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools, handle_tool_errors=True)
tool_node = ToolNode(tools, handle_tool_errors="Please try again.")
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))

Use custom handlers when you need deterministic error shaping for model recovery. For broader tool-recovery design, load references/llm-recovery.md.

Critical Behavior (Do Not Skip)

  1. Supersteps are transactional: one failing parallel branch fails the whole superstep state update.
  2. RetryPolicy retries failing branches, not successful siblings.
  3. interrupt() re-runs the node on resume: side effects before interrupt must be idempotent, or moved after interrupt / separate node.
  4. JS Command routing requires ends metadata on addNode(...).
  5. Use explicit retry limits (max_attempts, plus state counters for recovery loops).

Local Assets In This Skill

Scripts

  • scripts/classify_error.py: classify exception category and recommended handling
  • scripts/wrap_with_retry.py: generate boilerplate node wrappers with retry/recovery/escalation options

Run from repo root:

uv run skills/langgraph-error-handling/scripts/classify_error.py TimeoutError --verbose
uv run skills/langgraph-error-handling/scripts/wrap_with_retry.py call_llm --with-llm-recovery

Examples

  • assets/examples/retry-example/: retry + recovery loop (Python and JS)
  • assets/examples/human-loop-example/: interrupt/resume approval flow (Python and JS)

Load References On Demand

  • references/error-types.md: error taxonomy and classification rules
  • references/retry-strategies.md: retry tuning, backoff, circuit-breaker-style patterns
  • references/llm-recovery.md: recovery-loop and ToolNode strategies
  • references/human-escalation.md: human approval, interrupts, and escalation patterns

Common Failure Modes

SymptomRoot CauseFix
interrupt() fails at runtimeno checkpointercompile with checkpointer
Resume starts new rundifferent thread_idreuse same thread_id
JS Command route not takenmissing endsadd ends to addNode
Infinite loopno termination counter/conditionadd retry counter + terminal branch
Retry never triggersexception excluded by retry filterset explicit retry_on/retryOn

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Web3

langgraph-agent-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

langgraph-state-management

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

langgraph-testing-evaluation

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

langgraph-project-setup

No summary provided by upstream source.

Repository SourceNeeds Review