Systematic Debugging Workflow
I'll help you debug issues systematically using the scientific method - hypothesis formation, testing, and iterative refinement.
Arguments: $ARGUMENTS - error description, reproduction steps, or context
Token Optimization
Target: 50% reduction (4,000-6,000 → 1,500-3,000 tokens)
Core Optimization Strategies
1. Hypothesis-Driven Debugging (Not Exhaustive Analysis)
- ❌ AVOID: Reading entire codebase to find bugs
- ✅ DO: Form hypotheses about likely causes, test top 2-3 first
- Token savings: 90% (200 tokens vs 2,000+ tokens)
- Pattern: Prioritize recently changed files, common failure patterns
2. Git Diff for Recently Changed Files (Likely Bug Source)
- ❌ AVOID:
ls -Rthen reading all files - ✅ DO:
git diff --name-only HEAD~3..HEADto find changed files - ✅ DO:
git log --oneline --since="3 days ago"for recent commits - Token savings: 85% (300 tokens vs 2,000+ tokens)
- Pattern: Bugs often introduced in recent changes
3. Stack Trace Parsing with Grep
- ❌ AVOID: Reading entire log files with Read tool
- ✅ DO:
grep -i "error\|exception\|fatal" logs/*.log | tail -20 - ✅ DO: Parse stack traces to extract file paths and line numbers
- Token savings: 95% (100 tokens vs 2,000+ tokens for large logs)
- Pattern: Stack traces reveal exact failure locations
4. Test Failure Analysis Caching
- ✅ Cache test results in
debug/state.json - ✅ Cache hypothesis outcomes to avoid retesting
- ✅ Cache reproduction steps once confirmed
- Token savings: 70% on subsequent debugging turns
- Pattern: Multi-turn debugging sessions benefit from state
5. Progressive Investigation (Narrow Before Deep)
- ✅ Start with stack trace → identify file → read specific function
- ✅ Hypothesis testing: test most likely causes first
- ✅ Binary search through git history when needed
- Token savings: 60% (stop early when cause found)
- Pattern: Most bugs have obvious causes in changed code
6. Session State Tracking for Multi-Turn Debugging
- ✅ Session files in
debug/directory - ✅ Track tested hypotheses to avoid repetition
- ✅ Resume from last checkpoint on subsequent runs
- Token savings: 80% on resumed sessions (skip completed work)
- Pattern: Complex bugs require multiple debugging turns
Token Usage by Operation
| Operation | Unoptimized | Optimized | Savings |
|---|---|---|---|
| Initial bug analysis | 2,000-3,000 | 500-1,000 | 60-75% |
| Hypothesis formation | 1,500-2,000 | 400-800 | 60-73% |
| Stack trace parsing | 2,000+ | 100-200 | 90-95% |
| File investigation | 2,000+ | 300-600 | 70-85% |
| Test reproduction | 1,000-1,500 | 200-400 | 73-80% |
| Session resume | 2,000-3,000 | 300-600 | 80-85% |
Average Reduction: 50% (4,000-6,000 → 1,500-3,000 tokens)
Debugging-Specific Patterns
Stack Trace Analysis:
# Extract file paths and line numbers from stack traces
grep -E "at .+ \(.+:[0-9]+:[0-9]+\)" error.log | head -10
# Focus investigation on these specific files/lines
Recent Changes Focus:
# Find files changed in last 3 days (likely bug sources)
git diff --name-only HEAD~10..HEAD
# Only read files that changed recently
Hypothesis Prioritization:
- Recent changes (80% of bugs) - Check git diff first
- Stack trace files (90% reliability) - Read exact failure locations
- Error message patterns (70% of bugs) - Grep for similar errors
- Environment/config (20% of bugs) - Check if configs changed
- External dependencies (10% of bugs) - Check updates
Binary Search for Regressions:
# Use git bisect to find regression commit
git bisect start HEAD v1.2.3
git bisect run npm test # Automated testing
# Saves 95% tokens vs manual testing each commit
Caching Behavior
Session Location: debug/ (in project root)
debug/plan.md- Debugging plan with hypotheses and resultsdebug/state.json- Session state and test resultsdebug/reproduction.log- Issue reproduction steps and logs
Cache Location: .claude/cache/debug/
hypotheses.json- Tested hypotheses and outcomesstack-traces.json- Parsed stack trace informationchanged-files.json- Recently changed files analysis
Cache Validity:
- Until issue resolved (status: "solved" in state.json)
- Until source files change (checksum-based)
- 7 days maximum for stale sessions
Shared With:
/debug-root-cause- Root cause analysis skill/debug-session- Debug session documentation/test- Test execution for verification
Usage Examples
Start New Debugging Session:
debug-systematic "API returns 500 on POST /users"
# Expected tokens: 1,500-3,000 (full analysis)
Resume Existing Session:
debug-systematic resume
# Expected tokens: 800-1,500 (skips completed hypotheses)
Test Specific Hypothesis:
debug-systematic test 1
# Expected tokens: 500-1,000 (focused testing)
Check Debugging Progress:
debug-systematic status
# Expected tokens: 200-500 (read session state only)
Mark Issue as Solved:
debug-systematic solved
# Expected tokens: 300-600 (generate summary)
Early Exit Conditions
Exit immediately (saves 90% tokens) when:
- ✅ Issue already solved (check
debug/state.jsonstatus) - ✅ No test framework available (can't reproduce)
- ✅ Not a git repository (can't check recent changes)
- ✅ Root cause already identified in session state
Progressive disclosure saves 60-80% tokens:
- Show hypothesis formation → wait for user confirmation
- Test one hypothesis at a time → report results
- Only deep dive when hypothesis confirms
Implementation Checklist
- ✅ Git diff analysis for recent changes (PRIMARY optimization)
- ✅ Stack trace parsing with Grep (saves 90-95%)
- ✅ Session-based hypothesis tracking (saves 70-80% on reruns)
- ✅ Progressive hypothesis testing (most likely → least likely)
- ✅ Bash-based log analysis (minimal tokens)
- ✅ Test failure result caching
- ✅ Early exit when issue resolved
- ✅ Binary search for regressions (git bisect)
- ✅ Focus area flags (specific file/function debugging)
Optimization Status: ✅ Optimized (Phase 2 Batch 2, 2026-01-26) Expected Tokens: 1,500-3,000 (vs. 4,000-6,000 unoptimized) Achieved Reduction: 50% average across all debugging operations
Session Intelligence
I'll maintain debugging session continuity:
Session Files (in current project directory):
debug/plan.md- Debugging plan with hypotheses and resultsdebug/state.json- Session state and test resultsdebug/reproduction.log- Issue reproduction steps and logs
IMPORTANT: Session files are stored in a debug folder in your current project root
Auto-Detection:
- If session exists: Resume debugging from last hypothesis
- If no session: Create debugging plan and initial reproduction
- Commands:
resume,reproduce,status,solved
Phase 1: Issue Reproduction & Information Gathering
Extended Thinking for Complex Debugging
For complex or elusive bugs, I'll use extended thinking to explore debugging strategies:
<think> When debugging complex issues: - Multiple potential root causes that interact - Timing-sensitive or race condition bugs - Environment-specific failures - Subtle state corruption scenarios - Performance degradation patterns - Security vulnerability exploitation paths </think>Triggers for Extended Analysis:
- Intermittent or non-deterministic bugs
- Production-only failures
- Performance issues without obvious cause
- Security vulnerabilities
- Multi-component system failures
MANDATORY FIRST STEPS:
- Check if
debugdirectory exists in current working directory - If directory exists, check for session files:
- Look for
debug/state.json - Look for
debug/plan.md - If found, resume from last hypothesis
- Look for
- If no directory or session exists:
- Gather error information
- Create reproduction steps
- Initialize debugging session
Information Gathering (Token-Efficient):
#!/bin/bash
# Systematic Debugging - Information Gathering
gather_debug_info() {
echo "=== Issue Reproduction Information ==="
echo ""
# 1. Error logs (use Grep, not cat)
echo "Recent error logs:"
if [ -d "logs" ]; then
grep -i "error\|exception\|fatal" logs/*.log 2>/dev/null | tail -20 || echo " No errors in logs"
fi
# 2. Git status (what changed recently)
echo ""
echo "Recent changes:"
git log --oneline --since="3 days ago" | head -10 || echo " Not a git repository"
# 3. Environment info
echo ""
echo "Environment:"
if [ -f "package.json" ]; then
echo " Node: $(node --version 2>/dev/null || echo 'not installed')"
echo " NPM: $(npm --version 2>/dev/null || echo 'not installed')"
elif [ -f "requirements.txt" ]; then
echo " Python: $(python --version 2>/dev/null || echo 'not installed')"
fi
# 4. System resources
echo ""
echo "System resources:"
echo " Memory: $(free -h 2>/dev/null | grep Mem | awk '{print $3 "/" $2}' || echo 'N/A')"
echo " Disk: $(df -h . 2>/dev/null | tail -1 | awk '{print $3 "/" $2 " (" $5 ")"}' || echo 'N/A')"
# 5. Running processes (if server issue)
echo ""
echo "Relevant processes:"
ps aux | grep -E "node|python|java" | grep -v grep | head -5 || echo " No relevant processes"
}
gather_debug_info > debug/initial-state.log
cat debug/initial-state.log
Reproduction Steps:
#!/bin/bash
# Create reproducible test case
create_reproduction() {
cat > debug/reproduction.sh << 'EOF'
#!/bin/bash
# Minimal reproduction script
echo "=== Bug Reproduction Steps ==="
echo ""
echo "Step 1: Setup environment"
# TODO: Add setup commands
echo "Step 2: Execute actions that trigger bug"
# TODO: Add trigger commands
echo "Step 3: Verify bug occurs"
# TODO: Add verification
echo ""
echo "Expected: [describe expected behavior]"
echo "Actual: [describe actual behavior]"
EOF
chmod +x debug/reproduction.sh
echo "Created reproduction script: debug/reproduction.sh"
}
create_reproduction
Phase 2: Hypothesis Formation
I'll formulate testable hypotheses about the root cause:
Hypothesis Generation Framework:
# Debugging Plan - [timestamp]
## Issue Description
**Summary**: [brief description]
**Severity**: Critical | High | Medium | Low
**Impact**: [affected users/systems]
**Frequency**: Always | Intermittent | Rare
## Error Details
[Full error message/stack trace]
## Environment
- **Platform**: [OS, runtime version]
- **Configuration**: [relevant settings]
- **Recent Changes**: [commits/deployments]
## Hypotheses (Prioritized)
### Hypothesis 1: [Most likely cause] - PRIORITY: HIGH
**Theory**: [explanation of suspected cause]
**Evidence**: [supporting observations]
**Test**: [how to verify/disprove]
**Expected**: [what should happen if correct]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved
### Hypothesis 2: [Second most likely] - PRIORITY: MEDIUM
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved
### Hypothesis 3: [Alternative cause] - PRIORITY: LOW
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved
## Investigation Log
- [timestamp]: Initial reproduction successful
- [timestamp]: Hypothesis 1 testing in progress
Hypothesis Prioritization:
- Recent changes - Check git history
- Common patterns - Known bug categories
- Environment issues - Dependencies, config
- Logic errors - Code analysis
- External factors - Third-party services
Phase 3: Systematic Testing
I'll test each hypothesis methodically:
Testing Framework:
#!/bin/bash
# Hypothesis Testing Script
test_hypothesis() {
local hypothesis_num="$1"
local test_description="$2"
echo "=== Testing Hypothesis $hypothesis_num ==="
echo "Test: $test_description"
echo ""
# Create checkpoint before testing
git stash push -m "Debug checkpoint before hypothesis $hypothesis_num"
# Run test
local result="PENDING"
# Log result
echo "[$hypothesis_num] $test_description: $result" >> debug/test-results.log
}
# Example: Test hypothesis about missing dependency
test_dependency_hypothesis() {
echo "Hypothesis: Missing or incompatible dependency"
# Check dependency versions
if [ -f "package.json" ]; then
echo "Checking npm dependencies..."
npm list --depth=0 2>&1 | grep -i "missing\|error" && {
echo "❌ CONFIRMED: Missing dependencies detected"
return 0
}
fi
echo "✓ DISPROVED: All dependencies present"
return 1
}
# Example: Test hypothesis about race condition
test_race_condition_hypothesis() {
echo "Hypothesis: Race condition in async code"
# Add delays to test timing sensitivity
echo "Running test with delays..."
# TODO: Add test with deliberate delays
echo "Running test rapidly..."
for i in {1..10}; do
# TODO: Run test in tight loop
true
done
}
# Test each hypothesis in priority order
test_dependency_hypothesis
test_race_condition_hypothesis
Binary Search Debugging:
#!/bin/bash
# Binary search through git history to find regression
git_bisect_debug() {
echo "=== Git Bisect Debugging ==="
# Find last known good commit
read -p "Enter last known good commit (or tag): " good_commit
read -p "Enter first known bad commit (or 'HEAD'): " bad_commit
git bisect start
git bisect bad $bad_commit
git bisect good $good_commit
cat > debug/bisect-test.sh << 'EOF'
#!/bin/bash
# Automated bisect test script
# Run test
npm test || exit 1 # Exit 1 if bad, 0 if good
# Or manual verification
echo "Test the current commit and press:"
echo " g - if this commit is good"
echo " b - if this commit is bad"
read -n 1 response
[ "$response" = "g" ] && exit 0 || exit 1
EOF
chmod +x debug/bisect-test.sh
echo "Run: git bisect run ./debug/bisect-test.sh"
}
Phase 4: Isolation & Simplification
I'll create minimal test cases:
Issue Isolation:
#!/bin/bash
# Create minimal reproducible example
create_minimal_reproduction() {
local issue_type="$1"
mkdir -p debug/minimal-case
case $issue_type in
"api")
cat > debug/minimal-case/test.js << 'EOF'
// Minimal API test case
const fetch = require('node-fetch');
async function testIssue() {
const response = await fetch('http://localhost:3000/api/endpoint');
const data = await response.json();
console.log('Response:', data);
// Add assertion that fails
}
testIssue().catch(console.error);
EOF
;;
"frontend")
cat > debug/minimal-case/test.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
<title>Minimal Test Case</title>
</head>
<body>
<button id="testBtn">Click to trigger issue</button>
<div id="output"></div>
<script>
document.getElementById('testBtn').addEventListener('click', () => {
// Minimal code to reproduce issue
console.log('Testing...');
});
</script>
</body>
</html>
EOF
;;
"database")
cat > debug/minimal-case/test.sql << 'EOF'
-- Minimal database query to reproduce issue
BEGIN TRANSACTION;
-- Setup test data
CREATE TEMP TABLE test_data (id INT, value TEXT);
INSERT INTO test_data VALUES (1, 'test');
-- Query that demonstrates issue
SELECT * FROM test_data WHERE condition;
ROLLBACK;
EOF
;;
esac
echo "Created minimal test case in debug/minimal-case/"
}
Phase 5: Solution Implementation
Once root cause is identified, I'll implement the fix:
Fix Validation:
#!/bin/bash
# Validate fix before committing
validate_fix() {
echo "=== Fix Validation ==="
# 1. Run original reproduction - should now pass
echo "Step 1: Run original reproduction..."
if [ -f "debug/reproduction.sh" ]; then
./debug/reproduction.sh && echo "✓ Original issue resolved" || {
echo "❌ Issue still reproduces"
return 1
}
fi
# 2. Run full test suite
echo "Step 2: Run test suite..."
npm test 2>&1 | tee debug/post-fix-tests.log
# 3. Check for regressions
echo "Step 3: Check for regressions..."
git diff HEAD -- . | grep -E "^\+" | grep -v "^+++" | head -20
# 4. Verify no new errors
echo "Step 4: Lint check..."
npm run lint 2>&1 | grep -i "error" && {
echo "⚠️ New linting errors introduced"
} || echo "✓ No new linting errors"
echo ""
echo "✓ Fix validation complete"
}
validate_fix
Fix Documentation:
## Solution
### Root Cause
[Detailed explanation of what caused the issue]
### Fix Applied
[Description of the solution]
```diff
// Before
- problematic code
// After
+ corrected code
Verification
- Original reproduction no longer triggers issue
- All tests passing
- No regressions introduced
- Edge cases handled
Prevention
[How to prevent similar issues in the future]
- Add test coverage for [scenario]
- Update validation to catch [condition]
- Add monitoring for [metric]
## Phase 6: Regression Prevention
I'll add safeguards to prevent recurrence:
**Test Addition:**
```bash
#!/bin/bash
# Add regression test
add_regression_test() {
local test_framework="$1"
case $test_framework in
"jest")
cat >> tests/regression.test.js << 'EOF'
describe('Regression: [Issue Description]', () => {
test('should not reproduce issue #123', async () => {
// Reproduce the scenario that previously failed
const result = await functionThatHadBug();
// Assert correct behavior
expect(result).toBe(expectedValue);
});
});
EOF
;;
"pytest")
cat >> tests/test_regression.py << 'EOF'
def test_issue_123_regression():
"""Regression test for [issue description]"""
# Reproduce the scenario
result = function_that_had_bug()
# Assert correct behavior
assert result == expected_value
EOF
;;
esac
echo "Added regression test to prevent future occurrence"
}
Context Continuity
Session Resume:
When you return and run /debug-systematic or /debug-systematic resume:
- Load debugging plan and hypothesis results
- Show which hypotheses have been tested
- Continue from next untested hypothesis
- Track full debugging timeline
Progress Example:
RESUMING DEBUGGING SESSION
├── Issue: API timeout on user search
├── Hypotheses: 5 total
├── Tested: 3 (2 disproved, 1 confirmed)
├── Current: Testing database query optimization
└── Status: Root cause identified
Continuing investigation...
Practical Examples
Start Debugging:
/debug-systematic "API returns 500 on POST /users"
/debug-systematic reproduce # Create reproduction steps
/debug-systematic # Auto-resume if session exists
Hypothesis Testing:
/debug-systematic test 1 # Test specific hypothesis
/debug-systematic isolate # Create minimal reproduction
/debug-systematic bisect # Git bisect to find regression
Session Control:
/debug-systematic resume # Continue debugging
/debug-systematic status # Show current progress
/debug-systematic solved # Mark as solved and summarize
Debugging Techniques
Common Debugging Patterns:
- Print Debugging:
add_debug_logging() {
echo "Adding strategic debug points..."
# Add before suspected issue
# Add after suspected issue
# Compare outputs
}
- Rubber Duck Debugging:
## Explain to Rubber Duck
1. What the code should do: [expected behavior]
2. What the code actually does: [actual behavior]
3. Step-by-step execution: [trace through]
4. Where it diverges: [AHA moment]
- Divide and Conquer:
# Comment out half the code
# Does issue persist?
# - Yes: Issue in remaining half
# - No: Issue in commented half
# Repeat until isolated
Safety Guarantees
Protection Measures:
- Git checkpoints before each test
- Automated state restoration
- No destructive operations without confirmation
- Clear rollback paths
Important: I will NEVER:
- Modify production code without validation
- Skip hypothesis testing
- Apply fixes without verification
- Add AI attribution
Skill Integration
When appropriate, I may suggest:
/test- Run comprehensive test suite/security-scan- Check if bug is security-related/commit- Commit fix with clear message
Advanced Debugging Tools
Performance Profiling:
profile_performance() {
# Node.js profiling
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Python profiling
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats
}
Memory Leak Detection:
detect_memory_leak() {
# Monitor memory over time
while true; do
ps aux | grep node | awk '{print $6}' | head -1
sleep 5
done | tee memory.log
# Analyze pattern
gnuplot << 'EOF'
set terminal png
set output 'memory-usage.png'
plot 'memory.log' with lines
EOF
}
Network Debugging:
debug_network() {
# Capture network traffic
tcpdump -i any -w debug/network.pcap port 3000
# Analyze with tshark
tshark -r debug/network.pcap -Y "http.response.code >= 400"
}
What I'll Actually Do
- Gather information - Comprehensive context using Grep
- Reproduce issue - Create reliable reproduction
- Form hypotheses - Prioritized theories about cause
- Test systematically - Validate each hypothesis
- Isolate problem - Minimal reproducible case
- Implement fix - Targeted solution
- Prevent regression - Add tests and monitoring
I'll maintain complete debugging session continuity, tracking all hypotheses and results across sessions.
Credits: Systematic debugging methodology based on scientific method and debugging best practices from "Debugging: The 9 Indispensable Rules" by David Agans.