Flamegraphs
Purpose
Guide agents through the pipeline from profiler data to SVG flamegraph, and teach interpretation of flamegraphs to drive concrete optimisation decisions.
Triggers
-
"How do I generate a flamegraph from perf data?"
-
"How do I read a flamegraph?"
-
"The flamegraph shows a wide frame — what does that mean?"
-
"How do I generate a flamegraph from Callgrind?"
-
"I want to compare two flamegraphs (before/after)"
Workflow
- Install FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph
No install needed; scripts are in the repo
export PATH=$PATH:/path/to/FlameGraph
- perf → flamegraph (most common path)
Step 1: record
perf record -F 999 -g -o perf.data ./prog
Step 2: generate script output
perf script -i perf.data > out.perf
Step 3: collapse stacks
stackcollapse-perf.pl out.perf > out.folded
Step 4: generate SVG
flamegraph.pl out.folded > flamegraph.svg
Step 5: view
xdg-open flamegraph.svg # Linux open flamegraph.svg # macOS
One-liner:
perf record -F 999 -g ./prog && perf script | stackcollapse-perf.pl | flamegraph.pl > fg.svg
- Differential flamegraph (before/after)
Collect two profiles
perf record -g -o before.data ./prog_old perf record -g -o after.data ./prog_new
Collapse
perf script -i before.data | stackcollapse-perf.pl > before.folded perf script -i after.data | stackcollapse-perf.pl > after.folded
Diff (red = regressed, blue = improved)
difffolded.pl before.folded after.folded | flamegraph.pl > diff.svg
- Callgrind → flamegraph
valgrind --tool=callgrind --callgrind-out-file=cg.out ./prog stackcollapse-callgrind.pl cg.out | flamegraph.pl > fg.svg
- Other profiler inputs
Go pprof
go tool pprof -raw -output=prof.txt prog stackcollapse-go.pl prof.txt | flamegraph.pl > fg.svg
DTrace
dtrace -x ustackframes=100 -n 'profile-99 /execname=="prog"/ { @[ustack()] = count(); }'
-o out.stacks sleep 10
stackcollapse.pl out.stacks | flamegraph.pl > fg.svg
Java (async-profiler)
async-profiler -d 30 -f out.collapsed PID flamegraph.pl out.collapsed > fg.svg
- Reading flamegraphs
A flamegraph is a call-stack visualisation:
-
X axis: time on CPU (not time sequence) — wider = more time
-
Y axis: call stack depth — taller = deeper call chain
-
Color: random (no significance) — unless using differential mode
What to look for:
Pattern Meaning Action
Wide frame near bottom Function itself is hot Optimise that function
Wide frame with tall narrow towers Calling many different callees Hot dispatch; reduce call overhead
Very tall stack with wide base Deep recursion Check recursion depth; consider iterative approach
Plateau at the top Leaf function with no callees This leaf is the actual hotspot
Many narrow identical stacks Many threads doing the same work Consider parallelism or batching
Identifying the actionable hotspot:
-
Find the widest top frame (a frame with no or narrow children above it)
-
That is where CPU time is actually spent
-
Trace down to understand what called it and why
Differential flamegraph:
-
Red frames: more time in new profile (regression)
-
Blue frames: less time in new profile (improvement)
-
Frames only in one profile appear solid colored
- flamegraph.pl options
flamegraph.pl --title "My App"
--subtitle "Release build, workload X"
--width 1600
--height 16
--minwidth 0.5
--colors java
out.folded > fg.svg
Option Effect
--title
SVG title
--width
Width in pixels
--height
Frame height in pixels
--minwidth
Omit frames < N% (reduces clutter)
--colors
Palette: hot (default), mem , io , java , js , perl , red , green , blue
--inverted
Icicle chart (roots at top)
--reverse
Reverse stacks
--cp
Consistent palette (same frame = same color across SVGs)
References
For tool installation, stackcollapse scripts, and palette options, see references/tools.md.
Related skills
-
Use skills/profilers/linux-perf to collect perf data
-
Use skills/profilers/valgrind to collect Callgrind data
-
Use skills/compilers/clang for LLVM PGO from sampling profiles