Systematic Debugging
Core Principle
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Never apply symptom-focused patches that mask underlying problems. Understand WHY something fails before attempting to fix it.
The Four-Phase Framework
Phase 1: Root Cause Investigation
Before touching any code:
-
Read error messages thoroughly - Every word matters
-
Reproduce the issue consistently - If you can't reproduce it, you can't verify a fix
-
Examine recent changes - What changed before this started failing?
-
Gather diagnostic evidence - Logs, stack traces, state dumps
-
Trace data flow - Follow the call chain to find where bad values originate
Root Cause Tracing Technique:
- Observe the symptom - Where does the error manifest?
- Find immediate cause - Which code directly produces the error?
- Ask "What called this?" - Map the call chain upward
- Keep tracing up - Follow invalid data backward through the stack
- Find original trigger - Where did the problem actually start?
Key principle: Never fix problems solely where errors appear—always trace to the original trigger.
Phase 2: Pattern Analysis
-
Locate working examples - Find similar code that works correctly
-
Compare implementations completely - Don't just skim
-
Identify differences - What's different between working and broken?
-
Understand dependencies - What does this code depend on?
Phase 3: Hypothesis and Testing
Apply the scientific method:
-
Formulate ONE clear hypothesis - "The error occurs because X"
-
Design minimal test - Change ONE variable at a time
-
Predict the outcome - What should happen if hypothesis is correct?
-
Run the test - Execute and observe
-
Verify results - Did it behave as predicted?
-
Iterate or proceed - Refine hypothesis if wrong, implement if right
Phase 4: Implementation
-
Create failing test case - Captures the bug behavior
-
Implement single fix - Address root cause, not symptoms
-
Verify test passes - Confirms fix works
-
Run full test suite - Ensure no regressions
-
If fix fails, STOP - Re-evaluate hypothesis
Critical rule: If THREE or more fixes fail consecutively, STOP. This signals architectural problems requiring discussion, not more patches.
Red Flags - Process Violations
Stop immediately if you catch yourself thinking:
-
"Quick fix for now, investigate later"
-
"One more fix attempt" (after multiple failures)
-
"This should work" (without understanding why)
-
"Let me just try..." (without hypothesis)
-
"It works on my machine" (without investigating difference)
Warning Signs of Deeper Problems
Consecutive fixes revealing new problems in different areas indicates architectural issues:
-
Stop patching
-
Document what you've found
-
Discuss with team before proceeding
-
Consider if the design needs rethinking
Common Debugging Scenarios
Test Failures
- Read the FULL error message and stack trace
- Identify which assertion failed and why
- Check test setup - is the test environment correct?
- Check test data - are mocks/fixtures correct?
- Trace to the source of unexpected value
Runtime Errors
- Capture the full stack trace
- Identify the line that throws
- Check what values are undefined/null
- Trace backward to find where bad value originated
- Add validation at the source
"It worked before"
- Use git bisect to find the breaking commit
- Compare the change with previous working version
- Identify what assumption changed
- Fix at the source of the assumption violation
Intermittent Failures
- Look for race conditions
- Check for shared mutable state
- Examine async operation ordering
- Look for timing dependencies
- Add deterministic waits or proper synchronization
Debugging Checklist
Before claiming a bug is fixed:
-
Root cause identified and documented
-
Hypothesis formed and tested
-
Fix addresses root cause, not symptoms
-
Failing test created that reproduces bug
-
Test now passes with fix
-
Full test suite passes
-
No "quick fix" rationalization used
-
Fix is minimal and focused
Success Metrics
Systematic debugging achieves ~95% first-time fix rate vs ~40% with ad-hoc approaches.
Signs you're doing it right:
-
Fixes don't create new bugs
-
You can explain WHY the bug occurred
-
Similar bugs don't recur
-
Code is better after the fix, not just "working"
Integration with Other Skills
- testing-patterns: Create test that reproduces the bug before fixing