langgraph-checkpoints

LangGraph Checkpointing

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "langgraph-checkpoints" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-langgraph-checkpoints

LangGraph Checkpointing

Persist workflow state for recovery and debugging.

Checkpointer Options

from langgraph.checkpoint import MemorySaver from langgraph.checkpoint.sqlite import SqliteSaver from langgraph.checkpoint.postgres import PostgresSaver

Development: In-memory

memory = MemorySaver() app = workflow.compile(checkpointer=memory)

Production: SQLite

checkpointer = SqliteSaver.from_conn_string("checkpoints.db") app = workflow.compile(checkpointer=checkpointer)

Production: PostgreSQL

checkpointer = PostgresSaver.from_conn_string("postgresql://...") app = workflow.compile(checkpointer=checkpointer)

Using Thread IDs

Start new workflow

config = {"configurable": {"thread_id": "analysis-123"}} result = app.invoke(initial_state, config=config)

Resume interrupted workflow

config = {"configurable": {"thread_id": "analysis-123"}} result = app.invoke(None, config=config) # Resumes from checkpoint

PostgreSQL Setup

def create_checkpointer(): """Create PostgreSQL checkpointer for production.""" return PostgresSaver.from_conn_string( settings.DATABASE_URL, save_every=1 # Save after each node )

Compile with checkpointing

app = workflow.compile( checkpointer=create_checkpointer(), interrupt_before=["quality_gate"] # Manual review point )

Inspecting Checkpoints

Get all checkpoints for a workflow

checkpoints = app.get_state_history(config)

for checkpoint in checkpoints: print(f"Step: {checkpoint.metadata['step']}") print(f"Node: {checkpoint.metadata['source']}") print(f"State: {checkpoint.values}")

Get current state

current = app.get_state(config) print(current.values)

Resuming After Crash

import logging

async def run_with_recovery(workflow_id: str, initial_state: dict): """Run workflow with automatic recovery.""" config = {"configurable": {"thread_id": workflow_id}}

try:
    # Try to resume existing workflow
    state = app.get_state(config)
    if state.values:
        logging.info(f"Resuming workflow {workflow_id}")
        return app.invoke(None, config=config)
except Exception:
    pass  # No existing checkpoint

# Start fresh
logging.info(f"Starting new workflow {workflow_id}")
return app.invoke(initial_state, config=config)

Step-by-Step Debugging

Execute one node at a time

for step in app.stream(initial_state, config): print(f"After {step['node']}: {step['state']}") input("Press Enter to continue...")

Rollback to previous checkpoint

history = list(app.get_state_history(config)) previous_state = history[1] # One step back app.update_state(config, previous_state.values)

Store vs Checkpointer (2026 Best Practice)

from langgraph.checkpoint.postgres import PostgresSaver from langgraph.store.postgres import PostgresStore

Checkpointer = SHORT-TERM memory (thread-scoped)

- Conversation history within a session

- Workflow state for resume/recovery

- Scoped to thread_id

checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)

Store = LONG-TERM memory (cross-thread)

- User preferences across sessions

- Learned facts about users

- Shared across ALL threads for a user

store = PostgresStore.from_conn_string(DATABASE_URL)

Compile with BOTH for full memory support

app = workflow.compile( checkpointer=checkpointer, # Thread-scoped state store=store # Cross-thread memory )

Using Store for Cross-Thread Memory

from langgraph.store.base import BaseStore

async def agent_with_memory(state: AgentState, *, store: BaseStore): """Agent that remembers across conversations.""" user_id = state["user_id"]

# Read cross-thread memory (user preferences)
memories = await store.aget(namespace=("users", user_id), key="preferences")

# Use memories in agent logic
if memories and memories.value.get("prefers_concise"):
    state["system_prompt"] += "\nBe concise in responses."

# Update cross-thread memory (learned facts)
await store.aput(
    namespace=("users", user_id),
    key="last_topic",
    value={"topic": state["current_topic"], "timestamp": datetime.now().isoformat()}
)

return state

Register node with store access

workflow.add_node("agent", agent_with_memory)

Memory Architecture

┌─────────────────────────────────────────────────────────────┐ │ User: alice │ ├─────────────────────────────────────────────────────────────┤ │ Thread 1 (chat-001) │ Thread 2 (chat-002) │ │ ┌─────────────────┐ │ ┌─────────────────┐ │ │ │ Checkpointer │ │ │ Checkpointer │ │ │ │ - msg history │ │ │ - msg history │ │ │ │ - workflow pos │ │ │ - workflow pos │ │ │ └─────────────────┘ │ └─────────────────┘ │ ├─────────────────────────────────────────────────────────────┤ │ Store (cross-thread) │ │ namespace=("users", "alice") │ │ - preferences: {prefers_concise: true} │ │ - last_topic: {topic: "langgraph", timestamp: "..."} │ └─────────────────────────────────────────────────────────────┘

Graph Migrations (2026 Feature)

LangGraph handles topology changes automatically:

Safe changes (handled automatically):

- Adding new nodes

- Removing nodes

- Renaming nodes

- Adding state keys

- Removing state keys

Works for both active and completed threads

Limitation: Cannot remove node if thread is interrupted at that node

Checkpoint Cleanup Strategies

from datetime import datetime, timedelta

Option 1: TTL-based cleanup (configure at DB level)

CREATE INDEX idx_checkpoints_created ON checkpoints(created_at);

DELETE FROM checkpoints WHERE created_at < NOW() - INTERVAL '30 days';

Option 2: Manual cleanup

async def cleanup_old_checkpoints(db, days: int = 30): """Remove checkpoints older than N days.""" cutoff = datetime.now() - timedelta(days=days) await db.execute( "DELETE FROM langgraph_checkpoints WHERE created_at < $1", cutoff )

Option 3: Per-thread cleanup

async def cleanup_thread(db, thread_id: str, keep_latest: int = 10): """Keep only latest N checkpoints per thread.""" await db.execute(""" DELETE FROM langgraph_checkpoints WHERE thread_id = $1 AND id NOT IN ( SELECT id FROM langgraph_checkpoints WHERE thread_id = $1 ORDER BY created_at DESC LIMIT $2 ) """, thread_id, keep_latest)

Key Decisions

Decision Recommendation

Development MemorySaver (fast, no setup)

Production PostgresSaver (shared, durable)

Thread ID Use deterministic ID (workflow_id)

Short-term memory Checkpointer (thread-scoped)

Long-term memory Store (cross-thread, namespaced)

Cleanup TTL-based or keep-latest-N per thread

Migrations Automatic for topology changes

Common Mistakes

  • No checkpointer in production (lose progress)

  • Random thread IDs (can't resume)

  • Not handling missing checkpoints

  • Using only checkpointer for user preferences (lost across threads)

  • Not using namespaces in Store (data collisions)

  • Not cleaning up old checkpoints (database bloat)

  • Removing nodes while threads are interrupted at them

Evaluations

See references/evaluations.md for test cases.

Related Skills

  • langgraph-state

  • State schemas that persist well with checkpointing

  • langgraph-human-in-loop

  • Interrupt patterns that leverage checkpoints

  • langgraph-supervisor

  • Checkpoint supervisor progress for fault tolerance

  • langgraph-streaming

  • Stream checkpoint updates to clients

  • langgraph-functional

  • Functional API with automatic checkpointing

  • database-schema-designer

  • PostgreSQL checkpoint table setup

Capability Details

checkpoint-saving

Keywords: save checkpoint, checkpoint, persist state, save state Solves:

  • Save workflow state at key points

  • Implement checkpoint strategies

  • Handle checkpoint serialization

checkpoint-loading

Keywords: load checkpoint, restore, resume, recovery Solves:

  • Resume workflows from checkpoints

  • Implement state recovery

  • Handle checkpoint versioning

memory-backends

Keywords: memory backend, MemorySaver, SqliteSaver, PostgresSaver Solves:

  • Configure checkpoint storage backends

  • Choose between memory/SQLite/Postgres

  • Implement custom checkpoint storage

async-checkpoints

Keywords: async checkpoint, AsyncSqliteSaver, async persistence Solves:

  • Implement async checkpoint operations

  • Handle concurrent checkpoint access

  • Optimize checkpoint performance

conversation-history

Keywords: conversation, history, message history, thread Solves:

  • Persist conversation history

  • Implement thread-based checkpoints

  • Manage conversation state

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

memory

No summary provided by upstream source.

Repository SourceNeeds Review