fluxbench-core 0.1.0-beta.4

Worker runtime for FluxBench: Bencher struct, high-precision timing (RDTSC), and memory allocation tracking
Documentation

FluxBench

Benchmarking framework for Rust with crash isolation, statistical rigor, and CI integration.

Features

  • Process-Isolated Benchmarks: Panicking benchmarks don't terminate the suite. Fail-late architecture with supervisor-worker IPC.
  • Algebraic Verification: Performance assertions directly in code: #[verify(expr = "bench_a < bench_b")]
  • Synthetic Metrics: Compute derived metrics from benchmark results: #[synthetic(formula = "bench_a / bench_b")]
  • Multi-Way Comparisons: Generate comparison tables and series charts with #[compare]
  • Bootstrap Confidence Intervals: BCa (bias-corrected and accelerated) resampling, not just percentiles
  • Zero-Copy IPC: Efficient supervisor-worker communication using rkyv serialization (no parsing overhead)
  • High-Precision Timing: RDTSC cycle counting on x86_64 and AArch64 with std::time::Instant fallback for wall-clock nanoseconds
  • Flexible Execution: Process-isolated by default; in-process mode available for debugging
  • Configuration: flux.toml file with CLI override support
  • Multiple Output Formats: JSON, HTML, CSV, GitHub Actions summaries
  • CI Integration: Exit code 1 on critical failures; severity levels for different assertion types
  • Async Support: Benchmarks with tokio runtimes via #[bench(runtime = "multi_thread")]

Quick Start

1. Add Dependency

[dev-dependencies]
fluxbench = "<latest-version>"

2. Configure Bench Target

# Cargo.toml
[[bench]]
name = "my_benchmarks"
harness = false

3. Write Benchmarks

Create benches/my_benchmarks.rs:

use fluxbench::prelude::*;
use std::hint::black_box;

#[bench]
fn addition(b: &mut Bencher) {
    b.iter(|| black_box(42) + black_box(17));
}

#[bench(group = "compute")]
fn fibonacci(b: &mut Bencher) {
    fn fib(n: u32) -> u64 {
        if n <= 1 { n as u64 } else { fib(n - 1) + fib(n - 2) }
    }
    b.iter(|| black_box(fib(20)));
}

fn main() {
    fluxbench::run().unwrap();
}

Benchmarks can also live in examples/ (cargo run --example name --release). Both benches/ and examples/ are only compiled on demand and never included in your production binary.

4. Run Benchmarks

cargo bench

Or with specific CLI options:

cargo bench -- --group compute --warmup 5 --measurement 10

Defining Benchmarks

Basic Benchmark

#[bench]
fn my_benchmark(b: &mut Bencher) {
    b.iter(|| {
        // Code to benchmark
        expensive_operation()
    });
}

With Setup

#[bench]
fn with_setup(b: &mut Bencher) {
    b.iter_with_setup(
        || vec![1, 2, 3, 4, 5],  // Setup
        |data| data.iter().sum::<i32>()  // Measured code
    );
}

Grouping Benchmarks

#[bench(group = "sorting")]
fn sort_small(b: &mut Bencher) {
    let data: Vec<i32> = (0..100).collect();
    b.iter(|| {
        let mut v = data.clone();
        v.sort();
        v
    });
}

#[bench(group = "sorting")]
fn sort_large(b: &mut Bencher) {
    let data: Vec<i32> = (0..100000).collect();
    b.iter(|| {
        let mut v = data.clone();
        v.sort();
        v
    });
}

Tagging for Filtering

#[bench(group = "io", tags = "network")]
fn http_request(b: &mut Bencher) {
    // ...
}

#[bench(group = "io", tags = "file")]
fn disk_write(b: &mut Bencher) {
    // ...
}

Then run with: cargo bench -- --tag network or cargo bench -- --skip-tag network

Async Benchmarks

#[bench(runtime = "multi_thread", worker_threads = 4, group = "async")]
async fn async_operation(b: &mut Bencher) {
    b.iter_async(|| async {
        tokio::time::sleep(std::time::Duration::from_millis(1)).await;
    });
}

Runtimes: "multi_thread" or "current_thread"

Performance Assertions

Verification Macros

Assert that benchmarks meet performance criteria:

use fluxbench::verify;

#[verify(
    expr = "fibonacci < 50000",  // Less than 50us
    severity = "critical"
)]
struct FibUnder50us;

#[verify(
    expr = "fibonacci_iter < fibonacci_naive",
    severity = "warning"
)]
struct IterFasterThanNaive;

#[verify(
    expr = "fibonacci_naive_p99 < 1000000",  // p99 latency
    severity = "info"
)]
struct P99Check;

Severity Levels:

  • critical: Fails the benchmark suite (exit code 1)
  • warning: Reported but doesn't fail
  • info: Informational only

Available Metrics (for benchmark name bench_name):

  • bench_name - Mean time (nanoseconds)
  • bench_name_median - Median time
  • bench_name_min - Minimum time
  • bench_name_max - Maximum time
  • bench_name_p50 - 50th percentile (median)
  • bench_name_p90 - 90th percentile
  • bench_name_p95 - 95th percentile
  • bench_name_p99 - 99th percentile
  • bench_name_p999 - 99.9th percentile
  • bench_name_std_dev - Standard deviation
  • bench_name_skewness - Distribution skewness
  • bench_name_kurtosis - Distribution kurtosis
  • bench_name_ci_lower - 95% confidence interval lower bound
  • bench_name_ci_upper - 95% confidence interval upper bound
  • bench_name_throughput - Operations per second (if measured)

Synthetic Metrics

Compute derived metrics:

use fluxbench::synthetic;

#[synthetic(
    id = "speedup",
    formula = "fibonacci_naive / fibonacci_iter",
    unit = "x"
)]
struct FibSpeedup;

#[verify(expr = "speedup > 100", severity = "critical")]
struct SpeedupSignificant;

The formula supports:

  • Arithmetic: +, -, *, /, %
  • Comparison: <, >, <=, >=, ==, !=
  • Logical: &&, ||
  • Parentheses for grouping

Comparisons

Simple Comparison

use fluxbench::compare;

#[compare(
    id = "string_ops",
    title = "String Operations",
    benchmarks = ["bench_string_concat", "bench_string_parse"],
    baseline = "bench_string_concat",
    metric = "mean"
)]
struct StringComparison;

Generates a table showing speedup vs baseline for each benchmark.

Series Comparison

Create multi-point comparisons for scaling studies:

#[bench(group = "scaling")]
fn vec_sum_100(b: &mut Bencher) {
    let data: Vec<i64> = (0..100).collect();
    b.iter(|| data.iter().sum::<i64>());
}

#[bench(group = "scaling")]
fn vec_sum_1000(b: &mut Bencher) {
    let data: Vec<i64> = (0..1000).collect();
    b.iter(|| data.iter().sum::<i64>());
}

#[compare(
    id = "scale_100",
    title = "Vector Sum Scaling",
    benchmarks = ["bench_vec_sum_100"],
    group = "vec_scaling",
    x = "100"
)]
struct Scale100;

#[compare(
    id = "scale_1000",
    title = "Vector Sum Scaling",
    benchmarks = ["bench_vec_sum_1000"],
    group = "vec_scaling",
    x = "1000"
)]
struct Scale1000;

Multiple #[compare] with the same group combine into one chart.

CLI Usage

Run benchmarks with options:

cargo bench -- [OPTIONS] [FILTER]

Common Options

# List benchmarks without running
cargo bench -- list

# Run specific benchmark by regex
cargo bench -- bench_fib

# Run only a group
cargo bench -- --group sorting

# Filter by tag
cargo bench -- --tag expensive
cargo bench -- --skip-tag network

# Control execution
cargo bench -- --warmup 10 --measurement 20    # Seconds
cargo bench -- --min-iterations 100
cargo bench -- --isolated=false                # In-process mode
cargo bench -- --worker-timeout 120            # Worker timeout in seconds

# Output formats
cargo bench -- --format json --output results.json
cargo bench -- --format html --output results.html
cargo bench -- --format csv --output results.csv
cargo bench -- --format github-summary         # GitHub Actions summary

# Baseline comparison
cargo bench -- --baseline previous_results.json

# Dry run
cargo bench -- --dry-run

Full Option Reference

  • --filter <PATTERN> - Regex to match benchmark names
  • --group <GROUP> - Run only benchmarks in this group
  • --tag <TAG> - Include only benchmarks with this tag
  • --skip-tag <TAG> - Exclude benchmarks with this tag
  • --warmup <SECONDS> - Warmup duration before measurement (default: 3)
  • --measurement <SECONDS> - Measurement duration (default: 5)
  • --min-iterations <N> - Minimum iterations
  • --max-iterations <N> - Maximum iterations
  • --isolated <BOOL> - Run in separate processes (default: true)
  • --one-shot - Fresh worker process per benchmark (default: reuse workers)
  • --worker-timeout <SECONDS> - Worker process timeout (default: 60)
  • --threads <N> / -j <N> - Threads for parallel statistics computation (default: 0 = all cores)
  • --format <FORMAT> - Output format: json, html, csv, github-summary, human (default: human)
  • --output <FILE> - Output file (default: stdout)
  • --baseline <FILE> - Load baseline for comparison
  • --threshold <PCT> - Regression threshold percentage
  • --verbose / -v - Enable debug logging
  • --dry-run - List benchmarks without executing

Configuration

FluxBench works out of the box with sensible defaults — no configuration file is needed. For workspace-wide customization, you can optionally create a flux.toml in your project root. FluxBench auto-discovers it by walking up from the current directory.

Settings are applied in this priority order: macro attribute > CLI flag > flux.toml > built-in default.

[runner] — Benchmark Execution

Control how benchmarks are measured:

[runner]
warmup_time = "500ms"        # Warmup before measurement (default: "3s")
measurement_time = "1s"      # Measurement duration (default: "5s")
timeout = "30s"              # Per-benchmark timeout (default: "60s")
isolation = "process"        # "process", "in-process", or "thread" (default: "process")
bootstrap_iterations = 1000  # Bootstrap resamples for CIs (default: 10000)
confidence_level = 0.95      # Confidence level, 0.0–1.0 (default: 0.95)
# samples = 5               # Fixed sample count — skips warmup, runs exactly N iterations
# min_iterations = 100       # Minimum iterations per sample (default: auto-tuned)
# max_iterations = 1000000   # Maximum iterations per sample (default: auto-tuned)
# jobs = 4                   # Parallel isolated workers (default: sequential)

[allocator] — Allocation Tracking

Monitor heap allocations during benchmarks:

[allocator]
track = true                 # Track allocations during benchmarks (default: true)
fail_on_allocation = false   # Fail if any allocation occurs during measurement (default: false)
# max_bytes_per_iter = 1024  # Maximum bytes per iteration (default: unlimited)

[output] — Output & Baselines

Configure reporting and baseline persistence:

[output]
format = "human"                    # "human", "json", "github", "html", "csv" (default: "human")
directory = "target/fluxbench"      # Output directory for reports and baselines (default: "target/fluxbench")
save_baseline = false               # Save a JSON baseline after each run (default: false)
# baseline_path = "baseline.json"   # Compare against a saved baseline (default: unset)

[ci] — CI Integration

Control how FluxBench behaves in CI environments:

[ci]
regression_threshold = 5.0   # Fail CI if regression exceeds this percentage (default: 5.0)
github_annotations = true    # Emit ::warning and ::error annotations on PRs (default: false)
fail_on_critical = true      # Exit non-zero on critical verification failures (default: true)

Output Formats

Human (Default)

Console output with grouped results and statistics:

Group: compute
------------------------------------------------------------
  ✓ bench_fibonacci_iter
      mean: 127.42 ns  median: 127.00 ns  stddev: 0.77 ns
      min: 126.00 ns  max: 147.00 ns  samples: 60
      p50: 127.00 ns  p95: 129.00 ns  p99: 136.38 ns
      95% CI: [127.35, 129.12] ns
      throughput: 7847831.87 ops/sec
      cycles: mean 603  median 601  (4.72 GHz)

JSON

Machine-readable format with full metadata:

{
  "meta": {
    "version": "0.1.0",
    "timestamp": "2026-02-10T...",
    "git_commit": "abc123...",
    "system": { "os": "linux", "cpu": "...", "cpu_cores": 24 }
  },
  "results": [
    {
      "id": "bench_fibonacci_iter",
      "group": "compute",
      "status": "passed",
      "metrics": {
        "mean_ns": 127.42,
        "median_ns": 127.0,
        "std_dev_ns": 0.77,
        "p50_ns": 127.0,
        "p95_ns": 129.0,
        "p99_ns": 136.38
      }
    }
  ],
  "verifications": [...],
  "synthetics": [...]
}

CSV

Spreadsheet-compatible format with all metrics:

id,name,group,status,mean_ns,median_ns,std_dev_ns,min_ns,max_ns,p50_ns,p95_ns,p99_ns,samples,alloc_bytes,alloc_count,mean_cycles,median_cycles,cycles_per_ns
bench_fibonacci_iter,bench_fibonacci_iter,compute,passed,127.42,...

HTML

Self-contained interactive report with charts and tables.

GitHub Summary

Renders verification results in GitHub Actions workflow:

cargo bench -- --format github-summary >> $GITHUB_STEP_SUMMARY

Crash Isolation

Panicking benchmarks don't terminate the suite:

#[bench]
fn may_panic(b: &mut Bencher) {
    static COUNTER: AtomicU32 = AtomicU32::new(0);
    b.iter(|| {
        let count = COUNTER.fetch_add(1, SeqCst);
        if count >= 5 {
            panic!("Intentional panic!");  // Isolated; suite continues
        }
    });
}

With --isolated=true (default), the panic occurs in a worker process and is reported as a failure for that benchmark, not the suite.

Advanced Usage

Allocation Tracking

FluxBench can track heap allocations per benchmark iteration. To enable this, install the TrackingAllocator as the global allocator in your benchmark binary:

use fluxbench::prelude::*;
use fluxbench::TrackingAllocator;

#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;

#[bench]
fn vec_allocation(b: &mut Bencher) {
    b.iter(|| vec![1, 2, 3, 4, 5]);
}

fn main() { fluxbench::run(); }

Results will include allocation metrics for each benchmark:

  • alloc_bytes — total bytes allocated per iteration
  • alloc_count — number of allocations per iteration

These appear in JSON, CSV, and human output automatically.

Note: #[global_allocator] must be declared in the binary crate (your benches/*.rs file), not in a library. Rust allows only one global allocator per binary. Without it, track = true in flux.toml will report zero allocations.

You can also query allocation counters manually:

fluxbench::reset_allocation_counter();
// ... run some code ...
let (alloc_bytes, alloc_count) = fluxbench::current_allocation();
println!("Bytes: {}, Count: {}", alloc_bytes, alloc_count);

In-Process Mode

For debugging, run benchmarks in the same process:

cargo bench -- --isolated=false

Panics will crash immediately, so use this only for development.

Custom Bootstrap Configuration

Via flux.toml:

[runner]
bootstrap_iterations = 100000
confidence_level = 0.99

Higher iterations = more precise intervals, slower reporting.

Project Structure

The fluxbench workspace consists of:

  • fluxbench - Meta-crate, public API
  • fluxbench-cli - Supervisor process and CLI
  • fluxbench-core - Bencher, timer, worker, allocator
  • fluxbench-ipc - Zero-copy IPC transport with rkyv
  • fluxbench-stats - Bootstrap resampling and percentile computation
  • fluxbench-logic - Verification, synthetic metrics, dependency graphs
  • fluxbench-macros - Procedural macros for bench, verify, synthetic, compare
  • fluxbench-report - JSON, HTML, CSV, GitHub output generation

Examples

See fluxbench/examples/benchmarks.rs for a comprehensive example:

cargo run --example benchmarks -- list
cargo run --example benchmarks -- --group sorting
cargo run --example benchmarks -- --format json --output results.json

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Contributing

Contributions welcome. Please ensure benchmarks remain crash-isolated and statistical integrity is maintained.