Debugging Methodology
Systematic approach to finding and fixing bugs.
Core Principles
-
Occam's Razor - Start with the simplest explanation
-
Binary Search - Isolate the problem area systematically
-
Preserve Evidence - Understand state before making changes
-
Document Hypotheses - Track what was tried and didn't work
Debugging Workflow
- Understand → What is expected vs actual behavior?
- Reproduce → Can you trigger the bug reliably?
- Locate → Where in the code does it happen?
- Diagnose → Why does it happen? (root cause)
- Fix → Minimal change to resolve
- Verify → Confirm fix works, no regressions
Common Bug Patterns
Symptom Likely Cause Check First
TypeError/null Missing null check Input validation
Off-by-one Loop bounds, array index Boundary conditions
Race condition Async timing Await/promise handling
Import error Path/module resolution File paths, exports
Type mismatch Wrong type passed Function signatures
Flaky test Timing, shared state Test isolation
System-Level Tools
Memory Analysis
Valgrind (C/C++/Rust)
valgrind --leak-check=full --show-leak-kinds=all ./program valgrind --tool=massif ./program # Heap profiling
Python
python -m memory_profiler script.py
Performance Profiling
Linux perf
perf record -g ./program perf report perf top # Real-time CPU usage
Python
python -m cProfile -s cumtime script.py
System Tracing (Traditional)
System calls (ptrace-based, high overhead)
strace -f -e trace=all -p PID
Library calls
ltrace -f -S ./program
Open files/sockets
lsof -p PID
Memory mapping
pmap -x PID
eBPF Tracing (Modern, Production-Safe)
eBPF is the modern replacement for strace/ptrace-based tracing. Key advantages:
-
Low overhead: Safe for production use
-
No recompilation: Works on running binaries
-
Non-intrusive: Doesn't stop program execution
-
Kernel-verified: Bounded execution, can't crash the system
BCC tools (install: apt install bpfcc-tools)
Trace syscalls with timing (like strace but faster)
sudo syscount -p PID # Count syscalls sudo opensnoop -p PID # Trace file opens sudo execsnoop # Trace new processes sudo tcpconnect # Trace TCP connections sudo funccount 'vfs_*' # Count kernel function calls
bpftrace (install: apt install bpftrace)
One-liner tracing scripts
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }' sudo bpftrace -e 'uprobe:/bin/bash:readline { printf("readline\n"); }'
Trace function arguments in Go/other languages
sudo bpftrace -e 'uprobe:./myapp:main.handleRequest { printf("called\n"); }'
eBPF Tool Hierarchy:
Level Tool Use Case
High BCC tools Pre-built tracing scripts
Medium bpftrace One-liner custom traces
Low libbpf/gobpf Custom eBPF programs
When to use eBPF over strace:
-
Production systems (strace adds 10-100x overhead)
-
Long-running traces
-
High-frequency syscalls
-
When you can't afford to slow down the process
Network Debugging
Packet capture
tcpdump -i any port 8080
Connection status
ss -tuln netstat -tuln
Language-Specific Debugging
Python
Quick debug
import pdb; pdb.set_trace()
Better: ipdb or pudb
import ipdb; ipdb.set_trace()
Print with context
print(f"{var=}") # Python 3.8+
JavaScript/TypeScript
// Browser/Node debugger;
// Structured logging console.log({ var1, var2, context: 'function_name' });
Rust
// Debug print dbg!(&variable);
// Backtrace on panic RUST_BACKTRACE=1 cargo run
Debugging Questions
When stuck, ask:
-
What changed recently that could cause this?
-
Does it happen in all environments or just one?
-
Is the bug in my code or a dependency?
-
What assumptions am I making that might be wrong?
-
Can I write a minimal reproduction?
Effective Debugging Practices
-
Targeted changes: Form a hypothesis, change one thing at a time
-
Use proper debuggers: Step through code with breakpoints when possible
-
Find root causes: Trace issues to their origin, fix the source
-
Reproduce first: Create a minimal reproduction before attempting a fix
-
Verify the fix: Confirm the fix resolves the issue and passes tests