rust-profiling

Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rust-profiling" with this command: npx skills add mohitmishra786/low-level-dev-skills/mohitmishra786-low-level-dev-skills-rust-profiling

Rust Profiling

Purpose

Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.

Triggers

  • "How do I generate a flamegraph for a Rust program?"

  • "My Rust binary is huge — how do I find what's causing it?"

  • "How do I write Criterion benchmarks?"

  • "How do I measure monomorphization bloat?"

  • "Rust performance is worse than expected — how do I profile it?"

  • "How do I use perf with Rust?"

Workflow

  1. Build for profiling

Release with debug symbols (needed for readable profiles)

Cargo.toml:

[profile.release-with-debug] inherits = "release" debug = true

cargo build --profile release-with-debug

Or quick: release + debug info inline

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release

  1. Flamegraphs with cargo-flamegraph

Install

cargo install flamegraph

Linux: uses perf (requires perf_event_paranoid ≤ 1)

sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2

macOS: uses DTrace (requires sudo)

sudo cargo flamegraph --bin myapp -- arg1 arg2

Profile tests

cargo flamegraph --test mytest -- test_filter

Profile benchmarks

cargo flamegraph --bench mybench -- --bench

Output

Generates flamegraph.svg in current directory

Open in browser: firefox flamegraph.svg

Custom flamegraph options:

More samples

cargo flamegraph --freq 1000 --bin myapp

Filter to specific threads

cargo flamegraph --bin myapp -- args 2>/dev/null

Using perf directly for more control

perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

  1. Binary size analysis with cargo-bloat

Install

cargo install cargo-bloat

Show top functions by size

cargo bloat --release -n 20

Show per-crate size breakdown

cargo bloat --release --crates

Include only specific crate

cargo bloat --release --filter myapp

Compare before/after a change

cargo bloat --release --crates > before.txt

make changes

cargo bloat --release --crates > after.txt diff before.txt after.txt

Typical output:

File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...

  1. Monomorphization bloat with cargo-llvm-lines

Install

cargo install cargo-llvm-lines

Show LLVM IR line counts (proxy for monomorphization)

cargo llvm-lines --release | head -40

Filter to your crate only

cargo llvm-lines --release | grep '^myapp'

Typical output:

Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop

High Copies count = monomorphization expansion. Fix:

// Before: generic, gets monomorphized for every T fn process<T: AsRef<[u8]>>(data: T) -> usize { do_work(data.as_ref()) }

// After: thin generic wrapper + concrete inner fn process<T: AsRef<[u8]>>(data: T) -> usize { fn inner(data: &[u8]) -> usize { do_work(data) } inner(data.as_ref()) }

  1. Criterion microbenchmarks

Cargo.toml

[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }

[[bench]] name = "my_bench" harness = false

// benches/my_bench.rs use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) { // Simple benchmark c.bench_function("process 1000 items", |b| { let data: Vec<i32> = (0..1000).collect(); b.iter(|| process(black_box(&data))) // black_box prevents optimization }); }

fn bench_sizes(c: &mut Criterion) { let mut group = c.benchmark_group("process_sizes");

for size in [100, 1000, 10000].iter() {
    let data: Vec&#x3C;i32> = (0..*size).collect();
    group.bench_with_input(
        BenchmarkId::from_parameter(size),
        &#x26;data,
        |b, data| b.iter(|| process(black_box(data))),
    );
}
group.finish();

}

criterion_group!(benches, bench_process, bench_sizes); criterion_main!(benches);

Run all benchmarks

cargo bench

Run specific benchmark

cargo bench --bench my_bench

Run with filter

cargo bench -- process_sizes

Compare with baseline (save/load)

cargo bench -- --save-baseline before

make changes

cargo bench -- --baseline before

View HTML report

open target/criterion/report/index.html

  1. perf with Rust (Linux)

Record

perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq

Report

perf report # interactive TUI perf report --stdio --no-call-graph | head -40 # text

Annotate specific function

perf annotate myapp::hot_function

stat (quick counters)

perf stat ./target/release/myapp args

Rust-specific perf tips:

  • Build with debug = 1 (line tables only) for faster builds with line-level attribution

  • Use RUSTFLAGS="-C force-frame-pointers=yes" for better call graphs without DWARF unwinding

  • Disable ASLR for reproducible addresses: setarch $(uname -m) -R ./myapp

  1. heaptrack / DHAT for allocations

heaptrack (Linux)

heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50

DHAT via Valgrind

valgrind --tool=dhat ./target/debug/myapp args

Open dhat-out.* with dh_view.html

For flamegraph setup and Criterion configuration, see references/cargo-flamegraph-setup.md.

Related skills

  • Use skills/rust/rustc-basics for build configuration (debug symbols, profiles)

  • Use skills/profilers/linux-perf for perf fundamentals

  • Use skills/profilers/flamegraphs for reading and interpreting flamegraph SVGs

  • Use skills/profilers/valgrind for allocation profiling with massif/DHAT

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cmake

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

static-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

llvm

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gdb

No summary provided by upstream source.

Repository SourceNeeds Review