FluxBench
Benchmarking framework for Rust with crash isolation, statistical rigor, and CI integration.
Features
- Process-Isolated Benchmarks: Panicking benchmarks don't terminate the suite. Fail-late architecture with supervisor-worker IPC.
- Algebraic Verification: Performance assertions directly in code:
#[verify(expr = "bench_a < bench_b")] - Synthetic Metrics: Compute derived metrics from benchmark results:
#[synthetic(formula = "bench_a / bench_b")] - Multi-Way Comparisons: Generate comparison tables and series charts with
#[compare] - Bootstrap Confidence Intervals: BCa (bias-corrected and accelerated) resampling, not just percentiles
- Zero-Copy IPC: Efficient supervisor-worker communication using rkyv serialization (no parsing overhead)
- High-Precision Timing: RDTSC cycle counting on x86_64 and AArch64 with
std::time::Instantfallback for wall-clock nanoseconds - Flexible Execution: Process-isolated by default; in-process mode available for debugging
- Configuration:
flux.tomlfile with CLI override support - Multiple Output Formats: JSON, HTML, CSV, GitHub Actions summaries
- CI Integration: Exit code 1 on critical failures; severity levels for different assertion types
- Async Support: Benchmarks with tokio runtimes via
#[bench(runtime = "multi_thread")]
Quick Start
1. Add Dependency
[]
= "<latest-version>"
2. Configure Bench Target
# Cargo.toml
[[]]
= "my_benchmarks"
= false
3. Write Benchmarks
Create benches/my_benchmarks.rs:
use *;
use black_box;
Benchmarks can also live in examples/ (cargo run --example name --release). Both benches/ and examples/ are only compiled on demand and never included in your production binary.
4. Run Benchmarks
Or with specific CLI options:
Defining Benchmarks
Basic Benchmark
With Setup
Grouping Benchmarks
Tagging for Filtering
Then run with: cargo bench -- --tag network or cargo bench -- --skip-tag network
Async Benchmarks
async
Runtimes: "multi_thread" or "current_thread"
Performance Assertions
Verification Macros
Assert that benchmarks meet performance criteria:
use verify;
;
;
;
Severity Levels:
critical: Fails the benchmark suite (exit code 1)warning: Reported but doesn't failinfo: Informational only
Available Metrics (for benchmark name bench_name):
bench_name- Mean time (nanoseconds)bench_name_median- Median timebench_name_min- Minimum timebench_name_max- Maximum timebench_name_p50- 50th percentile (median)bench_name_p90- 90th percentilebench_name_p95- 95th percentilebench_name_p99- 99th percentilebench_name_p999- 99.9th percentilebench_name_std_dev- Standard deviationbench_name_skewness- Distribution skewnessbench_name_kurtosis- Distribution kurtosisbench_name_ci_lower- 95% confidence interval lower boundbench_name_ci_upper- 95% confidence interval upper boundbench_name_throughput- Operations per second (if measured)
Synthetic Metrics
Compute derived metrics:
use synthetic;
;
;
The formula supports:
- Arithmetic:
+,-,*,/,% - Comparison:
<,>,<=,>=,==,!= - Logical:
&&,|| - Parentheses for grouping
Comparisons
Simple Comparison
use compare;
;
Generates a table showing speedup vs baseline for each benchmark.
Series Comparison
Create multi-point comparisons for scaling studies:
;
;
Multiple #[compare] with the same group combine into one chart.
CLI Usage
Run benchmarks with options:
Common Options
# List benchmarks without running
# Run specific benchmark by regex
# Run only a group
# Filter by tag
# Control execution
# Output formats
# Baseline comparison
# Dry run
Full Option Reference
--filter <PATTERN>- Regex to match benchmark names--group <GROUP>- Run only benchmarks in this group--tag <TAG>- Include only benchmarks with this tag--skip-tag <TAG>- Exclude benchmarks with this tag--warmup <SECONDS>- Warmup duration before measurement (default: 3)--measurement <SECONDS>- Measurement duration (default: 5)--min-iterations <N>- Minimum iterations--max-iterations <N>- Maximum iterations--isolated <BOOL>- Run in separate processes (default: true)--one-shot- Fresh worker process per benchmark (default: reuse workers)--worker-timeout <SECONDS>- Worker process timeout (default: 60)--threads <N>/-j <N>- Threads for parallel statistics computation (default: 0 = all cores)--format <FORMAT>- Output format: json, html, csv, github-summary, human (default: human)--output <FILE>- Output file (default: stdout)--baseline <FILE>- Load baseline for comparison--threshold <PCT>- Regression threshold percentage--verbose/-v- Enable debug logging--dry-run- List benchmarks without executing
Configuration
FluxBench works out of the box with sensible defaults — no configuration file is needed. For workspace-wide customization, you can optionally create a flux.toml in your project root. FluxBench auto-discovers it by walking up from the current directory.
Settings are applied in this priority order: macro attribute > CLI flag > flux.toml > built-in default.
[runner] — Benchmark Execution
Control how benchmarks are measured:
[]
= "500ms" # Warmup before measurement (default: "3s")
= "1s" # Measurement duration (default: "5s")
= "30s" # Per-benchmark timeout (default: "60s")
= "process" # "process", "in-process", or "thread" (default: "process")
= 1000 # Bootstrap resamples for CIs (default: 10000)
= 0.95 # Confidence level, 0.0–1.0 (default: 0.95)
# samples = 5 # Fixed sample count — skips warmup, runs exactly N iterations
# min_iterations = 100 # Minimum iterations per sample (default: auto-tuned)
# max_iterations = 1000000 # Maximum iterations per sample (default: auto-tuned)
# jobs = 4 # Parallel isolated workers (default: sequential)
[allocator] — Allocation Tracking
Monitor heap allocations during benchmarks:
[]
= true # Track allocations during benchmarks (default: true)
= false # Fail if any allocation occurs during measurement (default: false)
# max_bytes_per_iter = 1024 # Maximum bytes per iteration (default: unlimited)
[output] — Output & Baselines
Configure reporting and baseline persistence:
[]
= "human" # "human", "json", "github", "html", "csv" (default: "human")
= "target/fluxbench" # Output directory for reports and baselines (default: "target/fluxbench")
= false # Save a JSON baseline after each run (default: false)
# baseline_path = "baseline.json" # Compare against a saved baseline (default: unset)
[ci] — CI Integration
Control how FluxBench behaves in CI environments:
[]
= 5.0 # Fail CI if regression exceeds this percentage (default: 5.0)
= true # Emit ::warning and ::error annotations on PRs (default: false)
= true # Exit non-zero on critical verification failures (default: true)
Output Formats
Human (Default)
Console output with grouped results and statistics:
Group: compute
------------------------------------------------------------
✓ bench_fibonacci_iter
mean: 127.42 ns median: 127.00 ns stddev: 0.77 ns
min: 126.00 ns max: 147.00 ns samples: 60
p50: 127.00 ns p95: 129.00 ns p99: 136.38 ns
95% CI: [127.35, 129.12] ns
throughput: 7847831.87 ops/sec
cycles: mean 603 median 601 (4.72 GHz)
JSON
Machine-readable format with full metadata:
CSV
Spreadsheet-compatible format with all metrics:
id,name,group,status,mean_ns,median_ns,std_dev_ns,min_ns,max_ns,p50_ns,p95_ns,p99_ns,samples,alloc_bytes,alloc_count,mean_cycles,median_cycles,cycles_per_ns
bench_fibonacci_iter,bench_fibonacci_iter,compute,passed,127.42,...
HTML
Self-contained interactive report with charts and tables.
GitHub Summary
Renders verification results in GitHub Actions workflow:
Crash Isolation
Panicking benchmarks don't terminate the suite:
With --isolated=true (default), the panic occurs in a worker process and is reported as a failure for that benchmark, not the suite.
Advanced Usage
Allocation Tracking
FluxBench can track heap allocations per benchmark iteration. To enable this, install the
TrackingAllocator as the global allocator in your benchmark binary:
use *;
use TrackingAllocator;
static GLOBAL: TrackingAllocator = TrackingAllocator;
Results will include allocation metrics for each benchmark:
- alloc_bytes — total bytes allocated per iteration
- alloc_count — number of allocations per iteration
These appear in JSON, CSV, and human output automatically.
Note:
#[global_allocator]must be declared in the binary crate (yourbenches/*.rsfile), not in a library. Rust allows only one global allocator per binary. Without it,track = trueinflux.tomlwill report zero allocations.
You can also query allocation counters manually:
reset_allocation_counter;
// ... run some code ...
let = current_allocation;
println!;
In-Process Mode
For debugging, run benchmarks in the same process:
Panics will crash immediately, so use this only for development.
Custom Bootstrap Configuration
Via flux.toml:
[]
= 100000
= 0.99
Higher iterations = more precise intervals, slower reporting.
Project Structure
The fluxbench workspace consists of:
- fluxbench - Meta-crate, public API
- fluxbench-cli - Supervisor process and CLI
- fluxbench-core - Bencher, timer, worker, allocator
- fluxbench-ipc - Zero-copy IPC transport with rkyv
- fluxbench-stats - Bootstrap resampling and percentile computation
- fluxbench-logic - Verification, synthetic metrics, dependency graphs
- fluxbench-macros - Procedural macros for bench, verify, synthetic, compare
- fluxbench-report - JSON, HTML, CSV, GitHub output generation
Examples
See fluxbench/examples/benchmarks.rs for a comprehensive example:
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Contributing
Contributions welcome. Please ensure benchmarks remain crash-isolated and statistical integrity is maintained.