restart-recovery

Make OpenClaw agent workflows restart-safe using checkpoint files, idempotent step tracking, wake/resume handoff, and stale-checkpoint monitoring. Use when users ask to recover from restarts, preserve progress across updates/config restarts, or implement checkpoint → restart → wake → resume patterns.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "restart-recovery" with this command: npx skills add stanrails/restart-recovery

Restart Recovery

Implement restart-safe execution with this sequence:

  1. checkpoint
  2. restart
  3. wake
  4. resume from file

Use bundled scripts

  • Use scripts/checkpoint_tool.py for deterministic checkpoint lifecycle:
    • start, update, resume, complete, list
  • Use scripts/checkpoint_selfcheck.py for stale unfinished checkpoint alerts without LLM/tool-token usage.

Required operating rules

  • Write checkpoints before any restart-prone operation (config patch/apply, update, service restart, long multi-step jobs).
  • Use atomic file writes (.tmp then rename).
  • Track completed and remaining steps explicitly.
  • Include an idempotency key per workflow to avoid duplicate side effects after resume.
  • Never write secrets/tokens to checkpoint files.
  • Acquire a resume lock before continuing unfinished work.

Recommended checkpoint location

  • Per agent: memory/checkpoints/*.json
  • Shared/default workspace flows: memory/checkpoints/*.json at workspace root

Startup instruction to add in AGENTS.md

Add this exact section:

## Restart-safe workflow rule
On startup, check `memory/checkpoints/*.json` for unfinished workflows. If found, acquire resume lock, validate checkpoint schema/hash, and continue from the last completed idempotent step.

No-LLM stale checkpoint monitor

Use host scheduler (launchd/systemd/cron), not LLM cron jobs.

  • Run every 10 minutes.
  • Alert only when unfinished checkpoints are older than threshold.
  • Log to local file for audit.

Suggested execution flow

  1. checkpoint_tool.py start before risky step.
  2. Perform step.
  3. checkpoint_tool.py update --complete <step> --step <next>.
  4. If restart happens, wake session/process.
  5. On startup/re-entry, checkpoint_tool.py resume and continue.
  6. checkpoint_tool.py complete when done.

Validation checklist

  • Simulate mid-work restart and verify resume from last completed step.
  • Confirm idempotency (no duplicate sends/writes/actions).
  • Confirm stale-check script only alerts after threshold.
  • Confirm old checkpoint cleanup policy (expiry).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

handdraw-flowchart

Create hand-drawn workflow diagrams from natural-language process descriptions by generating strictly validated Mermaid flowchart, sequenceDiagram, or classD...

Registry SourceRecently Updated
Automation

Find Agent

OceanBus-powered agent and service discovery via Yellow Pages. Use when users want to find someone, look for a service, reach out to an expert, discover anot...

Registry SourceRecently Updated
Automation

Qwen Web Agent

Browser automation for 通义千问 (Qwen) web interface at qianwen.com. Use when the agent needs to ask questions to Qwen AI and get back responses via browser auto...

Registry SourceRecently Updated
Automation

bot File Processor

通用文件处理技能,用于批量重命名和格式转换。当用户需要批量重命名文件(添加前缀/后缀、替换文本、编号重命名、正则表达式重命名)或转换文件格式(图片格式转换、PDF与图片互转、DOCX转PDF、Markdown转PDF)时使用此技能。

Registry SourceRecently Updated