Systematic Debugging Skill
Overview
This skill provides a structured four-phase debugging framework emphasizing root cause discovery before attempting fixes. Core principle: "Random fixes waste time and create new bugs. Quick patches mask underlying issues."
Quick Start
-
Investigate - Gather evidence, reproduce consistently
-
Analyze - Compare with working patterns
-
Hypothesize - Form and test specific theories
-
Implement - Fix with test coverage
When to Use
-
Bug reports requiring investigation
-
Test failures with unclear causes
-
Production incidents
-
Performance regressions
-
Integration failures
-
Any debugging that requires more than 5 minutes
The Four Phases
Phase 1: Root Cause Investigation
Objective: Understand the problem completely before attempting any fix.
Steps:
-
Examine error messages thoroughly
-
Reproduce the issue consistently
-
Review recent changes (commits, configs, dependencies)
-
Gather diagnostic evidence (logs, traces, metrics)
-
For multi-component systems, add instrumentation at each boundary
Questions to answer:
-
What exactly is failing?
-
When did it start failing?
-
What changed recently?
-
Can I reproduce it reliably?
Phase 2: Pattern Analysis
Objective: Find working examples and understand differences.
Steps:
-
Locate working examples in the codebase
-
Compare against reference implementations completely
-
Identify differences systematically
-
Understand all dependencies
Key comparisons:
-
Working vs. broken code paths
-
Expected vs. actual behavior
-
Known good state vs. current state
Phase 3: Hypothesis and Testing
Objective: Form and validate theories before changing code.
Steps:
-
Formulate a specific hypothesis
-
Design a test for the hypothesis
-
Test with minimal changes (one variable at a time)
-
Verify results before proceeding
Hypothesis format: "The bug occurs because [condition] when [trigger], which causes [symptom]."
Phase 4: Implementation
Objective: Fix the root cause with proper verification.
Steps:
-
Create a failing test case reproducing the bug
-
Implement a single fix addressing the root cause
-
Verify the test passes
-
Verify no other tests broke
-
Document the fix
Critical Safeguards
Hard Stop Rule
If >= 3 fixes fail: STOP and question the architecture.
When multiple fixes fail, the issue indicates deeper structural problems requiring discussion rather than continued symptom-patching.
Red Flags (Restart Process)
-
Proposing solutions before investigation
-
Attempting multiple simultaneous fixes
-
Assuming without verification
-
Skipping reproduction step
-
"It should work" without evidence
Debugging Anti-Patterns
Anti-Pattern Problem Correct Approach
Shotgun debugging Random changes hoping something works Systematic investigation
Printf debugging only Incomplete picture Structured instrumentation
Blame the framework Avoids understanding Verify framework behavior
"Works on my machine" Environment assumptions Document exact repro steps
Quick patch Hides root cause Find and fix actual cause
Instrumentation Strategies
Logging Strategy
- Entry/exit of suspected functions
- Input/output values at boundaries
- State changes at key points
- Timing information for performance issues
Boundary Tracing
For multi-component systems:
[Input] -> [Component A] -> [Component B] -> [Output] ^ ^ ^ ^ | | | | Check 1 Check 2 Check 3 Check 4
Add verification at each boundary to isolate failure point.
Best Practices
Do
-
Reproduce before investigating
-
Document investigation steps
-
Test one hypothesis at a time
-
Write regression test for every bug fix
-
Share findings with team
-
Update documentation when environment-related
Don't
-
Jump to conclusions
-
Make multiple changes at once
-
Fix symptoms instead of causes
-
Skip the hypothesis step
-
Merge fixes without tests
-
Ignore intermittent failures
Error Handling
Situation Action
Cannot reproduce Gather more context, check environment differences
Multiple potential causes Isolate and test each separately
Fix breaks other things Revert, investigate dependencies
Root cause unclear after investigation Escalate, add more instrumentation
Metrics
Metric Target Description
First-fix success rate
80% Fixes that resolve issue first time
Regression rate <5% Bug fixes causing new bugs
Investigation time ratio
60% Time spent investigating vs. coding
Documentation rate 100% Bugs documented with root cause
Debugging Checklist
-
Issue reproduced consistently
-
Recent changes reviewed
-
Error messages fully understood
-
Working comparison found
-
Hypothesis documented
-
Single-variable test performed
-
Root cause identified
-
Failing test written
-
Fix implemented
-
All tests pass
-
Fix documented
Related Skills
-
tdd-obra - Test-first development
-
writing-plans - Plan implementations
-
code-reviewer - Code quality review
Version History
- 1.0.0 (2026-01-19): Initial release adapted from obra/superpowers