piano
Automated instrumentation-based profiling for Rust. Point it at your project, get back self-time, call counts, and allocation data per function.
$ piano profile
found 5 function(s) across 3 file(s)
built: target/piano/debug/my-project
... normal program output ...
Function Self Calls Allocs Alloc Bytes
----------------------------------------------------------------------------------
parse 341.21ms 12 840 62.5KB
resolve 141.13ms 47 329 24.1KB
parse_item 94.58ms 108 1620 126.3KB
Or step by step:
$ piano build
found 5 function(s) across 3 file(s)
built: target/piano/debug/my-project
$ piano run
... normal program output ...
$ piano report
Piano rewrites your source at the AST level to inject RAII timing guards and an allocation-tracking allocator, builds the instrumented binary, and flushes per-frame results to target/piano/runs/ on process exit. Your original source is never modified.
Install
cargo install piano
Requires Rust 1.88+.
Usage
Instrument and build
By default, piano build instruments all functions in your project:
$ piano build
found 5 function(s) across 3 file(s)
built: target/piano/debug/my-project
Narrow scope by name, file, or module:
$ piano build --fn parse # functions containing "parse"
$ piano build --fn "Parser::parse" # specific impl method
$ piano build --fn parse --exact # exact match only
$ piano build --file src/lexer.rs # all functions in a file
$ piano build --mod resolver # all functions in a module
$ piano build --fn parse --fn resolve # multiple patterns
See which functions Piano cannot instrument (const, unsafe, extern):
$ piano build --list-skipped
The instrumented binary is written to target/piano/debug/<name>.
Execute the instrumented binary
piano run finds and executes the most recently built instrumented binary:
$ piano run
$ piano run -- --input data.csv --verbose # pass arguments after --
It looks in target/piano/debug/ and picks the most recent executable. Piano exits with the binary's exit code.
One-step profiling
piano profile combines build, execute, and report into one command:
$ piano profile --fn parse -- --input data.csv
Per-frame profiling
Each instrumented run records per-frame timing and allocation data. The default piano report shows aggregates (above). Use --frames for a per-frame breakdown with spike detection:
$ piano report --frames
Frame Total parse resolve parse_item Allocs Bytes
-----------------------------------------------------------------------
1 48.12ms 28.11ms 2.98ms 0.87ms 93 21.3KB
2 49.01ms 28.50ms 3.01ms 0.91ms 95 21.8KB
3 47.88ms 27.94ms 2.95ms 0.85ms 91 20.9KB
4 102.33ms 71.22ms 3.10ms 1.05ms 210 48.7KB <<
5 48.55ms 28.30ms 3.00ms 0.89ms 94 21.5KB
5 frames | 1 spikes (>2x median)
Frames exceeding 2x the median total time are marked with <<.
Tag and compare runs
$ piano tag baseline
tagged 'baseline'
# ... make changes, rebuild, re-run ...
$ piano tag current
tagged 'current'
$ piano diff baseline current
Function baseline current Delta Allocs A.Delta
----------------------------------------------------------------------------------------------
parse 341.21ms 198.44ms -142.77ms 640 -200
parse_item 94.58ms 88.12ms -6.46ms 1240 -380
resolve 141.13ms 141.09ms -0.04ms 329 +0
piano tag with no arguments lists all saved tags. piano diff with no arguments compares the two most recent runs. Both piano report and piano diff accept file paths or tag names.
CPU time
Add --cpu-time to measure per-thread CPU time alongside wall-clock time (Linux + macOS, 64-bit only):
$ piano build --cpu-time
The report adds CPU columns so you can distinguish computation from I/O or sleeping.
Multi-threaded programs
Programs using rayon or std::thread::spawn work out of the box. Each thread writes its own timing data with a shared run_id. piano report consolidates all files from the same run automatically.
How it works
piano buildcopies your project to a staging directory- Adds
piano-runtimeas a dependency in the stagedCargo.toml - Parses Rust source with
syn, finds functions matching your patterns, and injectslet _guard = piano_runtime::enter("name")at the top of each - Sets
PianoAllocatoras the global allocator to track per-frame heap activity - Builds with
cargo build
Each guard records wall-clock time on construction and drop. Self-time is computed by subtracting children's time from total time. The allocator attributes heap operations to the currently executing instrumented function, and frame summaries are computed when top-level guards complete.
Two crates: piano (CLI, AST rewriting, build orchestration) and piano-runtime (zero-dependency timing and allocation runtime injected into user projects). The runtime has zero external dependencies to avoid version conflicts.
Limitations
- Wall-clock timing by default. Use
--cpu-timeto add per-thread CPU time (Linux + macOS only). - Functions shorter than the guard overhead (~12ns on x86-64, sub-nanosecond on Apple Silicon) will have noisy measurements.
- Async functions record wall time only when futures migrate across threads (self-time may overcount).
- Allocation tracking counts heap operations only (
alloc/dealloc). Stack allocations and memory-mapped regions are not tracked.
License
MIT