fluxbench-core 0.1.1

Worker runtime for FluxBench: Bencher struct, high-precision timing (RDTSC), and memory allocation tracking
Documentation

FluxBench

Crates.io docs.rs License MSRV

Rigorous, configurable, and composable benchmarking framework for Rust.

The problem

Most Rust benchmarking tools give you timings. That's it. When a benchmark panics, your entire suite dies. When you want to know "is version B actually faster than version A?", you're left eyeballing numbers. And when CI passes despite a 40% regression, you only find out about it from your users.

What FluxBench does differently

Start in 5 lines, grow without rewriting. A benchmark is just a function with #[bench]:

#[bench]
fn my_benchmark(b: &mut Bencher) {
    b.iter(|| expensive_operation());
}

Run cargo bench and you get bootstrap confidence intervals, percentile stats (p50–p999), outlier detection, and cycle-accurate timing — all automatic.

Then compose what you need:

#[bench(group = "sorting", tags = "alloc")]      // organize and filter
#[verify(expr = "sort_new < sort_old")]          // fail CI if regression
#[synthetic(formula = "sort_old / sort_new")]    // compute speedup ratio
#[compare(baseline = "sort_old")]                // generate comparison tables

Each attribute is independent. Use one, use all, add them later — your benchmarks don't need to be restructured.

Benchmarks that crash don't take down the suite. Every benchmark runs in its own process. A panic, segfault, or timeout in one is reported as a failure for that benchmark — the rest keep running. Your CI finishes, you see what broke, you fix it.

Performance rules live next to the code they protect. Instead of a fragile shell script that parses output and compares numbers, you write #[verify(expr = "api_latency_p99 < 5000", severity = "critical")] and FluxBench enforces it on every run. Critical failures exit non-zero. Warnings get reported. Info is logged.

Multiple output formats. The same cargo bench run can produce terminal output for you, JSON for your pipeline, HTML for your team, CSV for a spreadsheet, or a GitHub Actions summary — just change --format.

Features

Feature Description
Crash isolation Supervisor-worker architecture — panics never terminate the suite
Bootstrap statistics BCa bootstrap CIs, RDTSC cycle counting, outlier detection, p50–p999
Verification #[verify(expr = "bench_a < bench_b", severity = "critical")]
Synthetic metrics #[synthetic(formula = "bench_a / bench_b", unit = "x")]
Comparisons #[compare(...)] — tables and series charts vs baseline
Output formats Human, JSON, HTML, CSV, GitHub Actions summary
CI integration Exit code 1 on critical failures, flux.toml severity levels
Allocation tracking Per-iteration heap bytes and count
Async support Tokio runtimes via #[bench(runtime = "multi_thread")]
Configuration flux.toml with CLI override, macro > CLI > file > default

Quick Start

1. Add Dependency

[dev-dependencies]
fluxbench = "<latest-version>"

2. Configure Bench Target

# Cargo.toml
[[bench]]
name = "my_benchmarks"
harness = false

3. Write Benchmarks

Create benches/my_benchmarks.rs:

use fluxbench::prelude::*;
use std::hint::black_box;

#[bench]
fn addition(b: &mut Bencher) {
    b.iter(|| black_box(42) + black_box(17));
}

#[bench(group = "compute")]
fn fibonacci(b: &mut Bencher) {
    fn fib(n: u32) -> u64 {
        if n <= 1 { n as u64 } else { fib(n - 1) + fib(n - 2) }
    }
    b.iter(|| black_box(fib(20)));
}

fn main() {
    fluxbench::run().unwrap();
}

Benchmarks can also live in examples/ (cargo run --example name --release). Both benches/ and examples/ are only compiled on demand and never included in your production binary.

4. Run Benchmarks

cargo bench

Or with specific CLI options:

cargo bench -- --group compute --warmup 5 --measurement 10

Defining Benchmarks

With Setup

#[bench]
fn with_setup(b: &mut Bencher) {
    b.iter_with_setup(
        || vec![1, 2, 3, 4, 5],  // Setup
        |data| data.iter().sum::<i32>()  // Measured code
    );
}

Grouping Benchmarks

#[bench(group = "sorting")]
fn sort_small(b: &mut Bencher) {
    let data: Vec<i32> = (0..100).collect();
    b.iter(|| {
        let mut v = data.clone();
        v.sort();
        v
    });
}

#[bench(group = "sorting")]
fn sort_large(b: &mut Bencher) {
    let data: Vec<i32> = (0..100000).collect();
    b.iter(|| {
        let mut v = data.clone();
        v.sort();
        v
    });
}

Tagging for Filtering

#[bench(group = "io", tags = "network")]
fn http_request(b: &mut Bencher) {
    // ...
}

#[bench(group = "io", tags = "file")]
fn disk_write(b: &mut Bencher) {
    // ...
}

Then run with: cargo bench -- --tag network or cargo bench -- --skip-tag network

Async Benchmarks

#[bench(runtime = "multi_thread", worker_threads = 4, group = "async")]
async fn async_operation(b: &mut Bencher) {
    b.iter_async(|| async {
        tokio::time::sleep(std::time::Duration::from_millis(1)).await;
    });
}

Runtimes: "multi_thread" or "current_thread"

Performance Assertions

Verification Macros

Assert that benchmarks meet performance criteria:

use fluxbench::verify;

#[verify(
    expr = "fibonacci < 50000",  // Less than 50us
    severity = "critical"
)]
struct FibUnder50us;

#[verify(
    expr = "fibonacci_iter < fibonacci_naive",
    severity = "warning"
)]
struct IterFasterThanNaive;

#[verify(
    expr = "fibonacci_naive_p99 < 1000000",  // p99 latency
    severity = "info"
)]
struct P99Check;

Severity Levels:

  • critical: Fails the benchmark suite (exit code 1)
  • warning: Reported but doesn't fail
  • info: Informational only

Available Metrics (for a benchmark named bench_name):

Suffix Metric
(none) Mean time (ns)
_median Median time
_min / _max Min / max time
_p50 _p90 _p95 _p99 _p999 Percentiles
_std_dev Standard deviation
_skewness / _kurtosis Distribution shape
_ci_lower / _ci_upper 95% confidence interval bounds
_throughput Operations per second (if measured)

Synthetic Metrics

Compute derived metrics:

use fluxbench::synthetic;

#[synthetic(
    id = "speedup",
    formula = "fibonacci_naive / fibonacci_iter",
    unit = "x"
)]
struct FibSpeedup;

#[verify(expr = "speedup > 100", severity = "critical")]
struct SpeedupSignificant;

The formula supports:

  • Arithmetic: +, -, *, /, %
  • Comparison: <, >, <=, >=, ==, !=
  • Logical: &&, ||
  • Parentheses for grouping

Comparisons

Simple Comparison

use fluxbench::compare;

#[compare(
    id = "string_ops",
    title = "String Operations",
    benchmarks = ["bench_string_concat", "bench_string_parse"],
    baseline = "bench_string_concat",
    metric = "mean"
)]
struct StringComparison;

Generates a table showing speedup vs baseline for each benchmark.

Series Comparison

Create multi-point comparisons for scaling studies:

#[bench(group = "scaling")]
fn vec_sum_100(b: &mut Bencher) {
    let data: Vec<i64> = (0..100).collect();
    b.iter(|| data.iter().sum::<i64>());
}

#[bench(group = "scaling")]
fn vec_sum_1000(b: &mut Bencher) {
    let data: Vec<i64> = (0..1000).collect();
    b.iter(|| data.iter().sum::<i64>());
}

#[compare(
    id = "scale_100",
    title = "Vector Sum Scaling",
    benchmarks = ["bench_vec_sum_100"],
    group = "vec_scaling",
    x = "100"
)]
struct Scale100;

#[compare(
    id = "scale_1000",
    title = "Vector Sum Scaling",
    benchmarks = ["bench_vec_sum_1000"],
    group = "vec_scaling",
    x = "1000"
)]
struct Scale1000;

Multiple #[compare] with the same group combine into one chart.

CLI Usage

Run benchmarks with options:

cargo bench -- [OPTIONS] [FILTER]

Common Options

# List benchmarks without running
cargo bench -- list

# Run specific benchmark by regex
cargo bench -- bench_fib

# Run only a group
cargo bench -- --group sorting

# Filter by tag
cargo bench -- --tag expensive
cargo bench -- --skip-tag network

# Control execution
cargo bench -- --warmup 10 --measurement 20    # Seconds
cargo bench -- --min-iterations 100
cargo bench -- --isolated=false                # In-process mode
cargo bench -- --worker-timeout 120            # Worker timeout in seconds

# Output formats
cargo bench -- --format json --output results.json
cargo bench -- --format html --output results.html
cargo bench -- --format csv --output results.csv
cargo bench -- --format github-summary         # GitHub Actions summary

# Baseline comparison
cargo bench -- --baseline previous_results.json

# Dry run
cargo bench -- --dry-run

Run cargo bench -- --help for the full option reference.

Configuration

FluxBench works out of the box with sensible defaults — no configuration file is needed. For workspace-wide customization, you can optionally create a flux.toml in your project root. FluxBench auto-discovers it by walking up from the current directory.

Settings are applied in this priority order: macro attribute > CLI flag > flux.toml > built-in default.

Runner

[runner]
warmup_time = "500ms"        # Warmup before measurement (default: "3s")
measurement_time = "1s"      # Measurement duration (default: "5s")
timeout = "30s"              # Per-benchmark timeout (default: "60s")
isolation = "process"        # "process", "in-process", or "thread" (default: "process")
bootstrap_iterations = 1000  # Bootstrap resamples for CIs (default: 10000)
confidence_level = 0.95      # Confidence level, 0.0–1.0 (default: 0.95)
# samples = 5               # Fixed sample count — skips warmup, runs exactly N iterations
# min_iterations = 100       # Minimum iterations per sample (default: auto-tuned)
# max_iterations = 1000000   # Maximum iterations per sample (default: auto-tuned)
# jobs = 4                   # Parallel isolated workers (default: sequential)

Allocator

[allocator]
track = true                 # Track allocations during benchmarks (default: true)
fail_on_allocation = false   # Fail if any allocation occurs during measurement (default: false)
# max_bytes_per_iter = 1024  # Maximum bytes per iteration (default: unlimited)

Output

[output]
format = "human"                    # "human", "json", "github", "html", "csv" (default: "human")
directory = "target/fluxbench"      # Output directory for reports and baselines (default: "target/fluxbench")
save_baseline = false               # Save a JSON baseline after each run (default: false)
# baseline_path = "baseline.json"   # Compare against a saved baseline (default: unset)

CI Integration

[ci]
regression_threshold = 5.0   # Fail CI if regression exceeds this percentage (default: 5.0)
github_annotations = true    # Emit ::warning and ::error annotations on PRs (default: false)
fail_on_critical = true      # Exit non-zero on critical verification failures (default: true)

Output Formats

Human (Default)

Console output with grouped results and statistics:

Group: compute
------------------------------------------------------------
  ✓ bench_fibonacci_iter
      mean: 127.42 ns  median: 127.00 ns  stddev: 0.77 ns
      min: 126.00 ns  max: 147.00 ns  samples: 60
      p50: 127.00 ns  p95: 129.00 ns  p99: 136.38 ns
      95% CI: [127.35, 129.12] ns
      throughput: 7847831.87 ops/sec
      cycles: mean 603  median 601  (4.72 GHz)

JSON

Machine-readable format with full metadata:

{
  "meta": {
    "version": "0.1.0",
    "timestamp": "2026-02-10T...",
    "git_commit": "abc123...",
    "system": { "os": "linux", "cpu": "...", "cpu_cores": 24 }
  },
  "results": [
    {
      "id": "bench_fibonacci_iter",
      "group": "compute",
      "status": "passed",
      "metrics": {
        "mean_ns": 127.42,
        "median_ns": 127.0,
        "std_dev_ns": 0.77,
        "p50_ns": 127.0,
        "p95_ns": 129.0,
        "p99_ns": 136.38
      }
    }
  ],
  "verifications": [...],
  "synthetics": [...]
}

CSV

Spreadsheet-compatible format with all metrics:

id,name,group,status,mean_ns,median_ns,std_dev_ns,min_ns,max_ns,p50_ns,p95_ns,p99_ns,samples,alloc_bytes,alloc_count,mean_cycles,median_cycles,cycles_per_ns
bench_fibonacci_iter,bench_fibonacci_iter,compute,passed,127.42,...

HTML

Self-contained interactive report with charts and tables.

GitHub Summary

Renders verification results in GitHub Actions workflow:

cargo bench -- --format github-summary >> $GITHUB_STEP_SUMMARY

Advanced Usage

Allocation Tracking

FluxBench can track heap allocations per benchmark iteration. To enable this, install the TrackingAllocator as the global allocator in your benchmark binary:

use fluxbench::prelude::*;
use fluxbench::TrackingAllocator;

#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;

#[bench]
fn vec_allocation(b: &mut Bencher) {
    b.iter(|| vec![1, 2, 3, 4, 5]);
}

fn main() { fluxbench::run(); }

Results will include allocation metrics for each benchmark:

  • alloc_bytes — total bytes allocated per iteration
  • alloc_count — number of allocations per iteration

These appear in JSON, CSV, and human output automatically.

Note: #[global_allocator] must be declared in the binary crate (your benches/*.rs file), not in a library. Rust allows only one global allocator per binary. Without it, track = true in flux.toml will report zero allocations.

You can also query allocation counters manually:

fluxbench::reset_allocation_counter();
// ... run some code ...
let (alloc_bytes, alloc_count) = fluxbench::current_allocation();
println!("Bytes: {}, Count: {}", alloc_bytes, alloc_count);

In-Process Mode

For debugging, run benchmarks in the same process:

cargo bench -- --isolated=false

Panics will crash immediately, so use this only for development.

Custom Bootstrap Configuration

Via flux.toml:

[runner]
bootstrap_iterations = 100000
confidence_level = 0.99

Higher iterations = more precise intervals, slower reporting.

Examples

The examples/ crate contains runnable demos for each feature:

Example What it shows
feature_iteration iter, iter_with_setup, iter_batched
feature_async Async benchmarks with tokio runtimes
feature_params Parameterized benchmarks with args
feature_verify #[verify] performance assertions
feature_compare #[compare] baseline tables and series
feature_allocations Heap allocation tracking
feature_panic Crash isolation (panicking benchmarks)
library_bench Benchmarking a library crate
ci_regression CI regression detection workflow
cargo run -p fluxbench-examples --example feature_iteration --release
cargo run -p fluxbench-examples --example feature_verify --release -- --format json

License

Licensed under the Apache License, Version 2.0.

Contributing

Contributions welcome. Please ensure benchmarks remain crash-isolated and statistical integrity is maintained.