State Management Patterns Skill
Standardized state management and persistence patterns for the autonomous-dev plugin ecosystem. Ensures reliable, crash-resistant state persistence across Claude restarts and system failures.
When This Skill Activates
-
Implementing state persistence
-
Managing crash recovery
-
Handling concurrent state access
-
Versioning state schemas
-
Tracking batch operations
-
Managing user preferences
-
Keywords: "state", "persistence", "JSON", "atomic", "crash recovery", "checkpoint"
Core Patterns
- JSON Persistence with Atomic Writes
Definition: Store state in JSON files with atomic writes to prevent corruption on crash.
Pattern:
import json from pathlib import Path from typing import Dict, Any import tempfile import os
def save_state_atomic(state: Dict[str, Any], state_file: Path) -> None: """Save state with atomic write to prevent corruption.
Args:
state: State dictionary to persist
state_file: Target state file path
Security:
- Atomic Write: Prevents partial writes on crash
- Temp File: Write to temp, then rename (atomic operation)
- Permissions: Preserves file permissions
"""
# Write to temporary file first
temp_fd, temp_path = tempfile.mkstemp(
dir=state_file.parent,
prefix=f".{state_file.name}.",
suffix=".tmp"
)
try:
# Write JSON to temp file
with os.fdopen(temp_fd, 'w') as f:
json.dump(state, f, indent=2)
# Atomic rename (overwrites target)
os.replace(temp_path, state_file)
except Exception:
# Clean up temp file on failure
if Path(temp_path).exists():
Path(temp_path).unlink()
raise
See: docs/json-persistence.md , examples/batch-state-example.py
- File Locking for Concurrent Access
Definition: Use file locks to prevent concurrent modification of state files.
Pattern:
import fcntl import json from pathlib import Path from contextlib import contextmanager
@contextmanager def file_lock(filepath: Path): """Acquire exclusive file lock for state file.
Args:
filepath: Path to file to lock
Yields:
Open file handle with exclusive lock
Example:
>>> with file_lock(state_file) as f:
... state = json.load(f)
... state['count'] += 1
... f.seek(0)
... f.truncate()
... json.dump(state, f)
"""
with filepath.open('r+') as f:
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
try:
yield f
finally:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
See: docs/file-locking.md , templates/file-lock-template.py
- Crash Recovery Pattern
Definition: Design state to enable recovery after crashes or interruptions.
Principles:
-
State includes enough context to resume operations
-
Progress tracking enables "resume from last checkpoint"
-
State validation detects corruption
-
Migration paths handle schema changes
Example:
@dataclass class BatchState: """Batch processing state with crash recovery support.
Attributes:
batch_id: Unique batch identifier
features: List of all features to process
current_index: Index of current feature
completed: List of completed feature names
failed: List of failed feature names
created_at: State creation timestamp
last_updated: Last update timestamp
"""
batch_id: str
features: List[str]
current_index: int = 0
completed: List[str] = None
failed: List[str] = None
created_at: str = None
last_updated: str = None
def __post_init__(self):
if self.completed is None:
self.completed = []
if self.failed is None:
self.failed = []
if self.created_at is None:
self.created_at = datetime.now().isoformat()
self.last_updated = datetime.now().isoformat()
See: docs/crash-recovery.md , examples/crash-recovery-example.py
- State Versioning and Migration
Definition: Version state schemas to enable graceful upgrades.
Pattern:
STATE_VERSION = "2.0.0"
def migrate_state(state: Dict[str, Any]) -> Dict[str, Any]: """Migrate state from old version to current.
Args:
state: State dictionary (any version)
Returns:
Migrated state (current version)
"""
version = state.get("version", "1.0.0")
if version == "1.0.0":
# Migrate 1.0.0 → 1.1.0
state = _migrate_1_0_to_1_1(state)
version = "1.1.0"
if version == "1.1.0":
# Migrate 1.1.0 → 2.0.0
state = _migrate_1_1_to_2_0(state)
version = "2.0.0"
state["version"] = STATE_VERSION
return state
See: docs/state-versioning.md , templates/state-manager-template.py
Real-World Examples
BatchStateManager Pattern
From plugins/autonomous-dev/lib/batch_state_manager.py :
Features:
-
JSON persistence with atomic writes
-
Crash recovery via --resume flag
-
Progress tracking (completed/failed features)
-
Automatic context management via Claude Code (200K token budget)
-
State versioning for schema upgrades
Note (Issue #218): Deprecated context clearing functions (should_clear_context() , pause_batch_for_clear() , get_clear_notification_message() ) have been removed as Claude Code v2.0+ handles context automatically with 200K token budget.
Usage:
Create batch state
state = create_batch_state(features=["feat1", "feat2", "feat3"]) state.batch_id # "batch-20251116-123456"
Process features
for feature in state.features: try: # Process feature result = process_feature(feature) # Feature implementation updates context automatically
except Exception as e:
# Track failures for audit trail
mark_failed(state, feature, str(e))
save_batch_state(state_file, state) # Atomic write
Resume after crash
state = load_batch_state(state_file) next_feature = get_next_pending_feature(state) # Skips completed
Context Management: Claude Code automatically manages the 200K token budget. No manual context clearing required.
Checkpoint Integration (Issue #79)
Agents save checkpoints using the portable pattern:
Portable Pattern (Works Anywhere)
from pathlib import Path import sys
Portable path detection
current = Path.cwd() while current != current.parent: if (current / ".git").exists(): project_root = current break current = current.parent
Add lib to path
lib_path = project_root / "plugins/autonomous-dev/lib" if lib_path.exists(): sys.path.insert(0, str(lib_path))
try:
from agent_tracker import AgentTracker
success = AgentTracker.save_agent_checkpoint(
agent_name='my-agent',
message='Task completed - found 5 patterns',
tools_used=['Read', 'Grep', 'WebSearch']
)
print(f"Checkpoint: {'saved' if success else 'skipped'}")
except ImportError:
print("ℹ️ Checkpoint skipped (user project)")
Features
-
Portable: Works from any directory (user projects, subdirectories, fresh installs)
-
No hardcoded paths: Uses dynamic project root detection
-
Graceful degradation: Returns False, doesn't block workflow
-
Security validated: Path validation (CWE-22), no subprocess (CWE-78)
Design Patterns
-
Progressive Enhancement: Works with or without tracking infrastructure
-
Non-blocking: Never raises exceptions
-
Two-tier: Library imports instead of subprocess calls
See: LIBRARIES.md Section 24 (agent_tracker.py), DEVELOPMENT.md Scenario 2.5, docs/LIBRARIES.md for API
Usage Guidelines
For Library Authors
When implementing stateful features:
-
Use JSON persistence with atomic writes
-
Add file locking for concurrent access protection
-
Design for crash recovery with resumable state
-
Version your state for schema evolution
-
Validate on load to detect corruption
For Claude
When creating or analyzing stateful libraries:
-
Load this skill when keywords match ("state", "persistence", etc.)
-
Follow persistence patterns for reliability
-
Implement crash recovery for long-running operations
-
Use atomic operations to prevent corruption
-
Reference templates in templates/ directory
Token Savings
By centralizing state management patterns in this skill:
-
Before: ~50 tokens per library for inline state management docs
-
After: ~10 tokens for skill reference comment
-
Savings: ~40 tokens per library
-
Total: ~400 tokens across 10 libraries (4-5% reduction)
Progressive Disclosure
This skill uses Claude Code 2.0+ progressive disclosure architecture:
-
Metadata (frontmatter): Always loaded (~180 tokens)
-
Full content: Loaded only when keywords match
-
Result: Efficient context usage, scales to 100+ skills
When you use terms like "state management", "persistence", "crash recovery", or "atomic writes", Claude Code automatically loads the full skill content.
Templates and Examples
Templates (reusable code structures)
-
templates/state-manager-template.py : Complete state manager class
-
templates/atomic-write-template.py : Atomic write implementation
-
templates/file-lock-template.py : File locking utilities
Examples (real implementations)
-
examples/batch-state-example.py : BatchStateManager pattern
-
examples/user-state-example.py : UserStateManager pattern
-
examples/crash-recovery-example.py : Crash recovery demonstration
Documentation (detailed guides)
-
docs/json-persistence.md : JSON storage patterns
-
docs/atomic-writes.md : Atomic write implementation
-
docs/file-locking.md : Concurrent access protection
-
docs/crash-recovery.md : Recovery strategies
Cross-References
This skill integrates with other autonomous-dev skills:
-
library-design-patterns: Two-tier design, progressive enhancement
-
error-handling-patterns: Exception handling and recovery
-
security-patterns: File permissions and path validation
See: skills/library-design-patterns/ , skills/error-handling-patterns/
Maintenance
This skill should be updated when:
-
New state management patterns emerge
-
State schema versioning needs change
-
Concurrency patterns evolve
-
Performance optimizations discovered
Last Updated: 2025-11-16 (Phase 8.8 - Initial creation) Version: 1.0.0
Hard Rules
FORBIDDEN:
-
Storing state without a defined schema or version field
-
Direct file writes without atomic operations (write-then-rename pattern)
-
State files without backup/recovery mechanism
-
Unbounded state growth (MUST have cleanup/rotation strategy)
REQUIRED:
-
All state files MUST include a schema version for migration support
-
State mutations MUST be atomic (no partial writes on failure)
-
State MUST be recoverable from corruption (fallback to defaults)
-
All state access MUST go through a single module (no scattered file reads)