Debugging Skill
Systematic approaches to investigating and diagnosing bugs.
Core Principle
Understand before fixing. A proper diagnosis leads to a proper fix.
Name
han-core:debug - Investigate and diagnose issues without necessarily fixing them
Synopsis
/debug [arguments]
Debug vs Fix
Use /debug when:
-
Investigating an issue to understand it
-
Need to gather information before fixing
-
Want to identify root cause without implementing solution
-
Triaging to determine severity/priority
-
Research phase before fix
Use /fix when:
-
Ready to implement the solution
-
Debugging AND fixing in one go
-
Issue is understood, just needs fixing
The Scientific Method for Debugging
- Observe
Gather all the facts:
-
What's the symptom? (What's happening that shouldn't?)
-
When does it happen? (Always, sometimes, specific conditions?)
-
Who's affected? (All users, some users, specific scenarios?)
-
Error messages? (Exact text, stack traces, error codes?)
-
Recent changes? (What changed before this started?)
Evidence to collect:
-
Error messages and stack traces
-
Application logs
-
User reports
-
Reproduction steps
-
Environment details (browser, OS, versions)
-
Network requests/responses
-
Database query logs
- Form Hypothesis
Based on symptoms, what could cause this?
Common categories:
-
Logic error: Code does wrong thing
-
State management: State gets out of sync
-
Async/timing: Race condition, callback hell
-
Data issue: Unexpected input format
-
Integration: API change, service down
-
Environment: Config, permissions, network
-
Resource: Memory leak, connection pool exhausted
Prioritize hypotheses:
-
Most likely causes first
-
Easiest to test first (when equal likelihood)
-
Most impactful if true
- Test Hypothesis
Design experiment to prove/disprove:
-
Add logging to see values
-
Add breakpoints to pause execution
-
Modify input to isolate variable
-
Disable feature to rule out
-
Compare with working version
Keep notes:
Hypothesis: Database query timeout Test: Add query timing logs Result: Query completes in 50ms Conclusion: Not the database
Hypothesis: Network latency Test: Check network tab, add timing Result: API call takes 5 seconds Conclusion: Found the issue
- Analyze Results
What did you learn?
-
Hypothesis confirmed or rejected?
-
New questions raised?
-
Unexpected findings?
-
Root cause identified?
- Repeat or Conclude
If root cause found:
-
Document findings
-
Estimate impact
-
Plan fix
If not found:
-
Form new hypothesis
-
Repeat cycle
Debugging Strategies
Strategy 1: Add Logging
Most universally useful technique:
// Strategic console.log placement function processOrder(order) { console.log('processOrder START:', { orderId: order.id })
const items = order.items console.log('items:', items.length)
const validated = validate(items) console.log('validation result:', validated)
if (!validated.success) { console.log('validation failed:', validated.errors) throw new Error('Invalid order') }
const total = calculateTotal(items) console.log('total calculated:', total)
console.log('processOrder END') return total }
Logging guidelines:
-
Log function entry/exit
-
Log branching decisions
-
Log external calls (API, database)
-
Log unexpected values
-
Include context (IDs, user info)
Strategy 2: Use Debugger
Interactive debugging:
// Browser function buggyFunction(input) { debugger; // Execution pauses here const result = transform(input) debugger; // And here return result }
// Node.js node --inspect app.js
Then open chrome://inspect in Chrome
Debugger features:
-
Step over (next line)
-
Step into (into function call)
-
Step out (back to caller)
-
Watch expressions
-
Call stack inspection
-
Variable inspection
Strategy 3: Binary Search
Isolate the problem area:
// 100 lines of code, bug somewhere
// Comment out lines 50-100 // Bug still happens? It's in lines 1-50
// Comment out lines 25-50 // Bug disappears? It's in lines 25-50
// Comment out lines 37-50 // Bug still happens? It's in lines 25-37
// Continue until isolated to specific lines
Strategy 4: Rubber Duck Debugging
Explain the problem out loud:
-
"This function is supposed to calculate shipping cost"
-
"It takes the weight and destination"
-
"First it... wait, it's using price instead of weight!"
-
(Bug found)
Why this works: Forces you to examine assumptions.
Strategy 5: Compare Working vs Broken
What's different?
Version comparison:
Find which commit broke it
git bisect start git bisect bad HEAD git bisect good v1.0.0
Git checks out middle commit
npm test git bisect good/bad
Repeat until found
Environment comparison:
-
Works locally but not production?
-
Works for some users but not others?
-
Worked yesterday but not today?
What changed?
Strategy 6: Simplify
Reduce to minimal reproduction:
// Complex case with bug processUserOrderWithDiscountsAndShipping(user, cart, promo, address)
// Simplify inputs one at a time processUserOrderWithDiscountsAndShipping(user, [], null, null) // Still breaks? Not discount or address
processUserOrderWithDiscountsAndShipping(null, [], null, null) // Works now? It's the user object
// What about the user object causes it?
Strategy 7: Check Assumptions
Question everything:
// Assumption: API returns array const users = await api.getUsers() users.forEach(...) // Crashes
// Check assumption console.log(typeof users) // "undefined" console.log(users) // undefined
// Assumption was wrong!
Common wrong assumptions:
-
Function returns expected type
-
Variable is defined
-
Array is not empty
-
API will always respond
-
Async operation has completed
-
State is up to date
Debugging by Symptom
"Intermittent failure"
Likely causes:
-
Race condition (timing-dependent)
-
Data-dependent (certain inputs trigger it)
-
Resource leak (happens after N operations)
-
External service flakiness
Investigation:
-
Add extensive logging
-
Look for async operations
-
Check timing between operations
-
Look for shared state
-
Run many times to see pattern
"Works locally, fails in production"
Check differences:
-
Environment variables
-
Data (production has different/more data)
-
Network (CORS, SSL, proxies)
-
Dependencies (versions, OS)
-
Resources (memory, connections)
"Slow performance"
Don't guess - profile:
Frontend:
-
Chrome DevTools > Performance tab
-
Look for long tasks (> 50ms)
-
Check for layout thrashing
-
Look for memory leaks
Backend:
-
Add timing logs around operations
-
Check database query time (EXPLAIN ANALYZE)
-
Check external API call time
-
Profile with APM tool
"Memory leak"
Investigation:
// Take heap snapshot // Do operation that leaks // Take another heap snapshot // Compare - what increased?
Common causes:
-
Event listeners not removed
-
Closures holding references
-
Global variables accumulating
-
Intervals not cleared
-
Cache growing unbounded
"Crash/Exception"
Read the stack trace:
Error: Cannot read property 'map' of undefined at processUsers (app.js:42:15) at handleRequest (app.js:23:3) at Server.<anonymous> (server.js:12:5)
Stack trace tells you:
-
Line 42: Where it crashed
-
Line 23: Where it was called from
-
Line 12: Origin of the request
Then:
-
Go to line 42
-
Check what's undefined
-
Trace back why it's undefined
"It works sometimes"
-
Race condition?
-
Timing issue?
-
Data-dependent?
-
Check for async issues
Common Bug Patterns
Null/Undefined
// Bug function process(user) { return user.name.toUpperCase() // Crashes if user is null }
// Investigation console.log('user:', user) // undefined - why? // Trace back to where user comes from
Off-by-One
// Bug for (let i = 0; i <= array.length; i++) { // <= instead of < process(array[i]) // Crashes on last iteration }
// Investigation console.log('i:', i, 'length:', array.length) // Notice i === array.length causes array[i] === undefined
Async Timing
// Bug let data fetchData().then(result => { data = result }) console.log(data) // undefined - async not complete
// Investigation console.log('1. Before fetch') fetchData().then(result => { console.log('3. Got result') data = result }) console.log('2. After fetch call') // Output: 1, 2, 3 - async completes later
State Mutation
// Bug function addItem(cart, item) { cart.items.push(item) // Mutates input! return cart }
const originalCart = { items: [] } const newCart = addItem(originalCart, item) // originalCart was modified - unexpected!
// Investigation console.log('before:', originalCart) const newCart = addItem(originalCart, item) console.log('after:', originalCart) // Changed!
Scope Issues
// Bug for (var i = 0; i < 3; i++) { setTimeout(() => console.log(i), 100) } // Prints: 3, 3, 3 (expected 0, 1, 2)
// Investigation // var is function-scoped, i is shared // By time timeout fires, loop is done, i === 3
// Fix: Use let (block-scoped) or capture i
Investigation Report Format
Investigation: [Issue description]
Symptoms
[What's happening that's wrong?]
Evidence
- Error message: [exact text]
- When it happens: [conditions]
- Frequency: [always/sometimes/rarely]
- Affected users: [all/some/specific group]
Reproduction Steps
- [Step 1]
- [Step 2]
- [Observe error]
Investigation Timeline
Hypothesis 1: [What I thought might be wrong]
- Tested by: [What I did to test]
- Result: [What I found]
- Conclusion: [Ruled out / Confirmed]
Hypothesis 2: [Next theory]
- Tested by: [What I did]
- Result: [What I found]
- Conclusion: [Ruled out / Confirmed]
Root Cause
[What's actually causing the issue]
Evidence:
- [Log showing the problem]
- [Stack trace pointing to source]
- [Data showing the pattern]
Impact
- Severity: [Critical/High/Medium/Low]
- Scope: [How many users/scenarios affected]
- Workaround: [Any temporary solutions]
Next Steps
- [What should be done to fix]
- [Any additional investigation needed]
- [Related issues to check]
Debugging Tools
Browser Developer Tools
Console:
-
console.log()
-
Print values
-
console.table()
-
Display arrays/objects as table
-
console.trace()
-
Print stack trace
-
console.time() / console.timeEnd()
-
Measure duration
Debugger:
-
Set breakpoints
-
Step through code
-
Inspect variables
-
Watch expressions
-
Call stack
Network:
-
View all requests
-
See request/response headers and bodies
-
Measure timing
-
Replay requests
Performance:
-
Record profile
-
See function call tree
-
Identify bottlenecks
-
Check memory usage
Command Line Tools
Search for text in files
grep -r "error" logs/
Follow log file
tail -f logs/app.log
Search with context
grep -B 5 -A 5 "ERROR" logs/app.log
Check disk space
df -h
Check memory
free -m
Check running processes
ps aux | grep node
Database Debugging
-- PostgreSQL EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Show slow queries SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
-- Check table size SELECT pg_size_pretty(pg_total_relation_size('users'));
-- Check indexes \d users
Debugging Checklist
Before Starting
-
Can reproduce the issue reliably?
-
Have error message or symptom description?
-
Know when it started happening?
-
Checked if recent changes related?
-
Checked logs for clues?
During Investigation
-
Formed clear hypothesis?
-
Testing hypothesis systematically?
-
Taking notes on findings?
-
Not making random changes hoping to fix?
-
Questioning assumptions?
After Finding Root Cause
-
Understand WHY it happens?
-
Can explain it to someone else?
-
Documented findings?
-
Estimated impact?
-
Identified proper fix?
Anti-Patterns
Random Code Changes
BAD: "Maybe if I change this... nope, try this... nope, try this..." GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens. Conclusion: X is not the cause."
Assuming Without Verifying
BAD: "The API must be returning valid data" GOOD: "Let me log the API response to see what it actually returns"
Stopping at Symptoms
BAD: "The page is blank. Fixed by adding a null check." GOOD: "The page is blank because user is null. User is null because authentication token expired. Root cause: token not being refreshed."
Debugging in Production
BAD: "Let me add console.log to production to see..." GOOD: "Let me reproduce locally and debug there, or use proper logging"
No Reproduction Steps
BAD: "It crashed once, let me guess why" GOOD: "Let me find reliable way to reproduce it first"
Examples
When the user says:
-
"Why is this page loading slowly?"
-
"Investigate this intermittent test failure"
-
"Figure out why users are seeing this error"
-
"Debug the memory leak in production"
-
"What's causing the database timeouts?"
Integration with Other Skills
-
Use proof-of-work skill to document evidence
-
Use test-driven-development skill to add regression test after fix
-
Use explain skill when explaining bug to others
-
Use boy-scout-rule skill while fixing (improve surrounding code)
Notes
-
Use TaskCreate to track investigation steps
-
Document findings even if not fixing immediately
-
Create minimal reproduction case
-
Consider using /fix once root cause is found
-
Add logging/metrics to prevent future issues
Remember
-
Reproduce first - If you can't reproduce, you can't debug
-
Gather evidence - Don't guess, look at data
-
Form hypothesis - What do you think is wrong?
-
Test systematically - Prove or disprove hypothesis
-
Find root cause - Not just symptoms
-
Document - Help future you and others
Debugging is detective work. Be methodical, not random.