FluxBench
Rigorous, configurable, and composable benchmarking framework for Rust.
The problem
Most Rust benchmarking tools give you timings. That's it. When a benchmark panics, your entire suite dies. When you want to know "is version B actually faster than version A?", you're left eyeballing numbers. And when CI passes despite a 40% regression, you only find out about it from your users.
What FluxBench does differently
Start in 5 lines, grow without rewriting. A benchmark is just a function with #[bench]:
Run cargo bench and you get bootstrap confidence intervals, percentile stats (p50–p999), outlier detection, and cycle-accurate timing — all automatic.
Then compose what you need:
// organize and filter
// fail CI if regression
// compute speedup ratio
// generate comparison tables
Each attribute is independent. Use one, use all, add them later — your benchmarks don't need to be restructured.
Benchmarks that crash don't take down the suite. Every benchmark runs in its own process. A panic, segfault, or timeout in one is reported as a failure for that benchmark — the rest keep running. Your CI finishes, you see what broke, you fix it.
Performance rules live next to the code they protect. Instead of a fragile shell script that parses output and compares numbers, you write #[verify(expr = "api_latency_p99 < 5000", severity = "critical")] and FluxBench enforces it on every run. Critical failures exit non-zero. Warnings get reported. Info is logged.
Multiple output formats. The same cargo bench run can produce terminal output for you, JSON for your pipeline, HTML for your team, CSV for a spreadsheet, or a GitHub Actions summary — just change --format.
Features
| Feature | Description |
|---|---|
| Crash isolation | Supervisor-worker architecture — panics never terminate the suite |
| Bootstrap statistics | BCa bootstrap CIs, RDTSC cycle counting, outlier detection, p50–p999 |
| Verification | #[verify(expr = "bench_a < bench_b", severity = "critical")] |
| Synthetic metrics | #[synthetic(formula = "bench_a / bench_b", unit = "x")] |
| Comparisons | #[compare(...)] — tables and series charts vs baseline |
| Output formats | Human, JSON, HTML, CSV, GitHub Actions summary |
| CI integration | Exit code 1 on critical failures, flux.toml severity levels |
| Allocation tracking | Per-iteration heap bytes and count |
| Async support | Tokio runtimes via #[bench(runtime = "multi_thread")] |
| Configuration | flux.toml with CLI override, macro > CLI > file > default |
Quick Start
1. Add Dependency
[]
= "<latest-version>"
2. Configure Bench Target
# Cargo.toml
[[]]
= "my_benchmarks"
= false
3. Write Benchmarks
Create benches/my_benchmarks.rs:
use *;
use black_box;
Benchmarks can also live in examples/ (cargo run --example name --release). Both benches/ and examples/ are only compiled on demand and never included in your production binary.
4. Run Benchmarks
Or with specific CLI options:
Defining Benchmarks
With Setup
Grouping Benchmarks
Tagging for Filtering
Then run with: cargo bench -- --tag network or cargo bench -- --skip-tag network
Async Benchmarks
async
Runtimes: "multi_thread" or "current_thread"
Performance Assertions
Verification Macros
Assert that benchmarks meet performance criteria:
use verify;
;
;
;
Severity Levels:
critical: Fails the benchmark suite (exit code 1)warning: Reported but doesn't failinfo: Informational only
Available Metrics (for a benchmark named bench_name):
| Suffix | Metric |
|---|---|
| (none) | Mean time (ns) |
_median |
Median time |
_min / _max |
Min / max time |
_p50 _p90 _p95 _p99 _p999 |
Percentiles |
_std_dev |
Standard deviation |
_skewness / _kurtosis |
Distribution shape |
_ci_lower / _ci_upper |
95% confidence interval bounds |
_throughput |
Operations per second (if measured) |
Synthetic Metrics
Compute derived metrics:
use synthetic;
;
;
The formula supports:
- Arithmetic:
+,-,*,/,% - Comparison:
<,>,<=,>=,==,!= - Logical:
&&,|| - Parentheses for grouping
Comparisons
Simple Comparison
use compare;
;
Generates a table showing speedup vs baseline for each benchmark.
Series Comparison
Create multi-point comparisons for scaling studies:
;
;
Multiple #[compare] with the same group combine into one chart.
CLI Usage
Run benchmarks with options:
Common Options
# List benchmarks without running
# Run specific benchmark by regex
# Run only a group
# Filter by tag
# Control execution
# Output formats
# Baseline comparison
# Dry run
Run cargo bench -- --help for the full option reference.
Configuration
FluxBench works out of the box with sensible defaults — no configuration file is needed. For workspace-wide customization, you can optionally create a flux.toml in your project root. FluxBench auto-discovers it by walking up from the current directory.
Settings are applied in this priority order: macro attribute > CLI flag > flux.toml > built-in default.
Runner
[]
= "500ms" # Warmup before measurement (default: "3s")
= "1s" # Measurement duration (default: "5s")
= "30s" # Per-benchmark timeout (default: "60s")
= "process" # "process", "in-process", or "thread" (default: "process")
= 1000 # Bootstrap resamples for CIs (default: 10000)
= 0.95 # Confidence level, 0.0–1.0 (default: 0.95)
# samples = 5 # Fixed sample count — skips warmup, runs exactly N iterations
# min_iterations = 100 # Minimum iterations per sample (default: auto-tuned)
# max_iterations = 1000000 # Maximum iterations per sample (default: auto-tuned)
# jobs = 4 # Parallel isolated workers (default: sequential)
Allocator
[]
= true # Track allocations during benchmarks (default: true)
= false # Fail if any allocation occurs during measurement (default: false)
# max_bytes_per_iter = 1024 # Maximum bytes per iteration (default: unlimited)
Output
[]
= "human" # "human", "json", "github", "html", "csv" (default: "human")
= "target/fluxbench" # Output directory for reports and baselines (default: "target/fluxbench")
= false # Save a JSON baseline after each run (default: false)
# baseline_path = "baseline.json" # Compare against a saved baseline (default: unset)
CI Integration
[]
= 5.0 # Fail CI if regression exceeds this percentage (default: 5.0)
= true # Emit ::warning and ::error annotations on PRs (default: false)
= true # Exit non-zero on critical verification failures (default: true)
Output Formats
Human (Default)
Console output with grouped results and statistics:
Group: compute
------------------------------------------------------------
✓ bench_fibonacci_iter
mean: 127.42 ns median: 127.00 ns stddev: 0.77 ns
min: 126.00 ns max: 147.00 ns samples: 60
p50: 127.00 ns p95: 129.00 ns p99: 136.38 ns
95% CI: [127.35, 129.12] ns
throughput: 7847831.87 ops/sec
cycles: mean 603 median 601 (4.72 GHz)
JSON
Machine-readable format with full metadata:
CSV
Spreadsheet-compatible format with all metrics:
id,name,group,status,mean_ns,median_ns,std_dev_ns,min_ns,max_ns,p50_ns,p95_ns,p99_ns,samples,alloc_bytes,alloc_count,mean_cycles,median_cycles,cycles_per_ns
bench_fibonacci_iter,bench_fibonacci_iter,compute,passed,127.42,...
HTML
Self-contained interactive report with charts and tables.
GitHub Summary
Renders verification results in GitHub Actions workflow:
Advanced Usage
Allocation Tracking
FluxBench can track heap allocations per benchmark iteration. To enable this, install the
TrackingAllocator as the global allocator in your benchmark binary:
use *;
use TrackingAllocator;
static GLOBAL: TrackingAllocator = TrackingAllocator;
Results will include allocation metrics for each benchmark:
- alloc_bytes — total bytes allocated per iteration
- alloc_count — number of allocations per iteration
These appear in JSON, CSV, and human output automatically.
Note:
#[global_allocator]must be declared in the binary crate (yourbenches/*.rsfile), not in a library. Rust allows only one global allocator per binary. Without it,track = trueinflux.tomlwill report zero allocations.
You can also query allocation counters manually:
reset_allocation_counter;
// ... run some code ...
let = current_allocation;
println!;
In-Process Mode
For debugging, run benchmarks in the same process:
Panics will crash immediately, so use this only for development.
Custom Bootstrap Configuration
Via flux.toml:
[]
= 100000
= 0.99
Higher iterations = more precise intervals, slower reporting.
Examples
The examples/ crate contains runnable demos for each feature:
| Example | What it shows |
|---|---|
feature_iteration |
iter, iter_with_setup, iter_batched |
feature_async |
Async benchmarks with tokio runtimes |
feature_params |
Parameterized benchmarks with args |
feature_verify |
#[verify] performance assertions |
feature_compare |
#[compare] baseline tables and series |
feature_allocations |
Heap allocation tracking |
feature_panic |
Crash isolation (panicking benchmarks) |
library_bench |
Benchmarking a library crate |
ci_regression |
CI regression detection workflow |
License
Licensed under the Apache License, Version 2.0.
Contributing
Contributions welcome. Please ensure benchmarks remain crash-isolated and statistical integrity is maintained.