Debugging Skill

Systematic approaches to investigating and diagnosing bugs.

Core Principle

Understand before fixing. A proper diagnosis leads to a proper fix.

Name

han-core:debug - Investigate and diagnose issues without necessarily fixing them

Synopsis

/debug [arguments]

Debug vs Fix

Use /debug when:

Investigating an issue to understand it
Need to gather information before fixing
Want to identify root cause without implementing solution
Triaging to determine severity/priority
Research phase before fix

Use /fix when:

Ready to implement the solution
Debugging AND fixing in one go
Issue is understood, just needs fixing

The Scientific Method for Debugging

Observe

Gather all the facts:

What's the symptom? (What's happening that shouldn't?)
When does it happen? (Always, sometimes, specific conditions?)
Who's affected? (All users, some users, specific scenarios?)
Error messages? (Exact text, stack traces, error codes?)
Recent changes? (What changed before this started?)

Evidence to collect:

Error messages and stack traces
Application logs
User reports
Reproduction steps
Environment details (browser, OS, versions)
Network requests/responses
Database query logs

Form Hypothesis

Based on symptoms, what could cause this?

Common categories:

Logic error: Code does wrong thing
State management: State gets out of sync
Async/timing: Race condition, callback hell
Data issue: Unexpected input format
Integration: API change, service down
Environment: Config, permissions, network
Resource: Memory leak, connection pool exhausted

Prioritize hypotheses:

Most likely causes first
Easiest to test first (when equal likelihood)
Most impactful if true

Test Hypothesis

Design experiment to prove/disprove:

Add logging to see values
Add breakpoints to pause execution
Modify input to isolate variable
Disable feature to rule out
Compare with working version

Keep notes:

Hypothesis: Database query timeout Test: Add query timing logs Result: Query completes in 50ms Conclusion: Not the database

Hypothesis: Network latency Test: Check network tab, add timing Result: API call takes 5 seconds Conclusion: Found the issue

Analyze Results

What did you learn?

Hypothesis confirmed or rejected?
New questions raised?
Unexpected findings?
Root cause identified?

Repeat or Conclude

If root cause found:

Document findings
Estimate impact
Plan fix

If not found:

Form new hypothesis
Repeat cycle

Debugging Strategies

Strategy 1: Add Logging

Most universally useful technique:

// Strategic console.log placement function processOrder(order) { console.log('processOrder START:', { orderId: order.id })

const items = order.items console.log('items:', items.length)

const validated = validate(items) console.log('validation result:', validated)

if (!validated.success) { console.log('validation failed:', validated.errors) throw new Error('Invalid order') }

const total = calculateTotal(items) console.log('total calculated:', total)

console.log('processOrder END') return total }

Logging guidelines:

Log function entry/exit
Log branching decisions
Log external calls (API, database)
Log unexpected values
Include context (IDs, user info)

Strategy 2: Use Debugger

Interactive debugging:

// Browser function buggyFunction(input) { debugger; // Execution pauses here const result = transform(input) debugger; // And here return result }

// Node.js node --inspect app.js

Then open chrome://inspect in Chrome

Debugger features:

Step over (next line)
Step into (into function call)
Step out (back to caller)
Watch expressions
Call stack inspection
Variable inspection

Strategy 3: Binary Search

Isolate the problem area:

// 100 lines of code, bug somewhere

// Comment out lines 50-100 // Bug still happens? It's in lines 1-50

// Comment out lines 25-50 // Bug disappears? It's in lines 25-50

// Comment out lines 37-50 // Bug still happens? It's in lines 25-37

// Continue until isolated to specific lines

Strategy 4: Rubber Duck Debugging

Explain the problem out loud:

"This function is supposed to calculate shipping cost"
"It takes the weight and destination"
"First it... wait, it's using price instead of weight!"
(Bug found)

Why this works: Forces you to examine assumptions.

Strategy 5: Compare Working vs Broken

What's different?

Version comparison:

Find which commit broke it

git bisect start git bisect bad HEAD git bisect good v1.0.0

Git checks out middle commit

npm test git bisect good/bad

Repeat until found

Environment comparison:

Works locally but not production?
Works for some users but not others?
Worked yesterday but not today?

What changed?

Strategy 6: Simplify

Reduce to minimal reproduction:

// Complex case with bug processUserOrderWithDiscountsAndShipping(user, cart, promo, address)

// Simplify inputs one at a time processUserOrderWithDiscountsAndShipping(user, [], null, null) // Still breaks? Not discount or address

processUserOrderWithDiscountsAndShipping(null, [], null, null) // Works now? It's the user object

// What about the user object causes it?

Strategy 7: Check Assumptions

Question everything:

// Assumption: API returns array const users = await api.getUsers() users.forEach(...) // Crashes

// Check assumption console.log(typeof users) // "undefined" console.log(users) // undefined

// Assumption was wrong!

Common wrong assumptions:

Function returns expected type
Variable is defined
Array is not empty
API will always respond
Async operation has completed
State is up to date

Debugging by Symptom

"Intermittent failure"

Likely causes:

Race condition (timing-dependent)
Data-dependent (certain inputs trigger it)
Resource leak (happens after N operations)
External service flakiness

Investigation:

Add extensive logging
Look for async operations
Check timing between operations
Look for shared state
Run many times to see pattern

"Works locally, fails in production"

Check differences:

Environment variables
Data (production has different/more data)
Network (CORS, SSL, proxies)
Dependencies (versions, OS)
Resources (memory, connections)

"Slow performance"

Don't guess - profile:

Frontend:

Chrome DevTools > Performance tab
Look for long tasks (> 50ms)
Check for layout thrashing
Look for memory leaks

Backend:

Add timing logs around operations
Check database query time (EXPLAIN ANALYZE)
Check external API call time
Profile with APM tool

"Memory leak"

Investigation:

// Take heap snapshot // Do operation that leaks // Take another heap snapshot // Compare - what increased?

Common causes:

Event listeners not removed
Closures holding references
Global variables accumulating
Intervals not cleared
Cache growing unbounded

"Crash/Exception"

Read the stack trace:

Error: Cannot read property 'map' of undefined at processUsers (app.js:42:15) at handleRequest (app.js:23:3) at Server.<anonymous> (server.js:12:5)

Stack trace tells you:

Line 42: Where it crashed
Line 23: Where it was called from
Line 12: Origin of the request

Then:

Go to line 42
Check what's undefined
Trace back why it's undefined

"It works sometimes"

Race condition?
Timing issue?
Data-dependent?
Check for async issues

Common Bug Patterns

Null/Undefined

// Bug function process(user) { return user.name.toUpperCase() // Crashes if user is null }

// Investigation console.log('user:', user) // undefined - why? // Trace back to where user comes from

Off-by-One

// Bug for (let i = 0; i <= array.length; i++) { // <= instead of < process(array[i]) // Crashes on last iteration }

// Investigation console.log('i:', i, 'length:', array.length) // Notice i === array.length causes array[i] === undefined

Async Timing

// Bug let data fetchData().then(result => { data = result }) console.log(data) // undefined - async not complete

// Investigation console.log('1. Before fetch') fetchData().then(result => { console.log('3. Got result') data = result }) console.log('2. After fetch call') // Output: 1, 2, 3 - async completes later

State Mutation

// Bug function addItem(cart, item) { cart.items.push(item) // Mutates input! return cart }

const originalCart = { items: [] } const newCart = addItem(originalCart, item) // originalCart was modified - unexpected!

// Investigation console.log('before:', originalCart) const newCart = addItem(originalCart, item) console.log('after:', originalCart) // Changed!

Scope Issues

// Bug for (var i = 0; i < 3; i++) { setTimeout(() => console.log(i), 100) } // Prints: 3, 3, 3 (expected 0, 1, 2)

// Investigation // var is function-scoped, i is shared // By time timeout fires, loop is done, i === 3

// Fix: Use let (block-scoped) or capture i

Investigation Report Format

Investigation: [Issue description]

Symptoms

[What's happening that's wrong?]

Evidence

Error message: [exact text]
When it happens: [conditions]
Frequency: [always/sometimes/rarely]
Affected users: [all/some/specific group]

Reproduction Steps

[Step 1]
[Step 2]
[Observe error]

Investigation Timeline

Hypothesis 1: [What I thought might be wrong]

Tested by: [What I did to test]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Hypothesis 2: [Next theory]

Tested by: [What I did]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Root Cause

[What's actually causing the issue]

Evidence:

[Log showing the problem]
[Stack trace pointing to source]
[Data showing the pattern]

Impact

Severity: [Critical/High/Medium/Low]
Scope: [How many users/scenarios affected]
Workaround: [Any temporary solutions]

Next Steps

[What should be done to fix]
[Any additional investigation needed]
[Related issues to check]

Debugging Tools

Browser Developer Tools

Console:

console.log()
Print values
console.table()
Display arrays/objects as table
console.trace()
Print stack trace
console.time() / console.timeEnd()
Measure duration

Debugger:

Set breakpoints
Step through code
Inspect variables
Watch expressions
Call stack

Network:

View all requests
See request/response headers and bodies
Measure timing
Replay requests

Performance:

Record profile
See function call tree
Identify bottlenecks
Check memory usage

Command Line Tools

Search for text in files

grep -r "error" logs/

Follow log file

tail -f logs/app.log

Search with context

grep -B 5 -A 5 "ERROR" logs/app.log

Check disk space

df -h

Check memory

free -m

Check running processes

ps aux | grep node

Database Debugging

-- PostgreSQL EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Show slow queries SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

-- Check table size SELECT pg_size_pretty(pg_total_relation_size('users'));

-- Check indexes \d users

Debugging Checklist

Before Starting

Can reproduce the issue reliably?
Have error message or symptom description?
Know when it started happening?
Checked if recent changes related?
Checked logs for clues?

During Investigation

Formed clear hypothesis?
Testing hypothesis systematically?
Taking notes on findings?
Not making random changes hoping to fix?
Questioning assumptions?

After Finding Root Cause

Understand WHY it happens?
Can explain it to someone else?
Documented findings?
Estimated impact?
Identified proper fix?

Anti-Patterns

Random Code Changes

BAD: "Maybe if I change this... nope, try this... nope, try this..." GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens. Conclusion: X is not the cause."

Assuming Without Verifying

BAD: "The API must be returning valid data" GOOD: "Let me log the API response to see what it actually returns"

Stopping at Symptoms

BAD: "The page is blank. Fixed by adding a null check." GOOD: "The page is blank because user is null. User is null because authentication token expired. Root cause: token not being refreshed."

Debugging in Production

BAD: "Let me add console.log to production to see..." GOOD: "Let me reproduce locally and debug there, or use proper logging"

No Reproduction Steps

BAD: "It crashed once, let me guess why" GOOD: "Let me find reliable way to reproduce it first"

Examples

When the user says:

"Why is this page loading slowly?"
"Investigate this intermittent test failure"
"Figure out why users are seeing this error"
"Debug the memory leak in production"
"What's causing the database timeouts?"

Integration with Other Skills

Use proof-of-work skill to document evidence
Use test-driven-development skill to add regression test after fix
Use explain skill when explaining bug to others
Use boy-scout-rule skill while fixing (improve surrounding code)

Notes

Use TaskCreate to track investigation steps
Document findings even if not fixing immediately
Create minimal reproduction case
Consider using /fix once root cause is found
Add logging/metrics to prevent future issues

Remember

Reproduce first - If you can't reproduce, you can't debug
Gather evidence - Don't guess, look at data
Form hypothesis - What do you think is wrong?
Test systematically - Prove or disprove hypothesis
Find root cause - Not just symptoms
Document - Help future you and others

Debugging is detective work. Be methodical, not random.

debug

Safety Notice

Copy this and send it to your AI assistant to learn