Crate zenbench

Expand description

§zenbench

Interleaved microbenchmarking for Rust with paired statistics, CI regression testing, and hardware-adaptive measurement.

Documentation · Example HTML Report · Tutorial

  compress_64k  200 rounds × 67 calls
                       mean ±mad µs  95% CI vs base          iB/s
  ├─ sequential
  │  ├─ level_1        16.2 ±0.5µs  [15.8–16.6]µs          3.78G
  │  ├─ level_6        15.1 ±0.5µs  [-4.7%–-3.5%]          4.05G
  │  ╰─ level_9        15.0 ±0.5µs  [-5.5%–-4.2%]          4.06G
  ╰─ patterns
     ├─ sequential     15.1 ±0.5µs  [-5.8%–-4.4%]          4.03G
     ╰─ mixed         401.0 ±8.1µs  [+2370%–+2385%]         156M

  level_9       ██████████████████████████████████████████████ 4.06 GiB/s
  level_6       ██████████████████████████████████████████████ 4.05 GiB/s
  sequential    █████████████████████████████████████████████ 4.03 GiB/s
  level_1       ███████████████████████████████████████████ 3.78 GiB/s
  mixed         ██ 156 MiB/s

§Why zenbench

Existing harnesses run benchmarks sequentially. Benchmark A runs on a hot CPU; benchmark B runs on an even hotter CPU with degraded turbo boost. System load changes between runs corrupt results.

Zenbench interleaves: each round, all benchmarks run in shuffled order. Round N of A and round N of B execute under identical conditions. Paired statistics on the round-by-round differences detect real changes — not thermal drift.

§vs criterion and divan

Feature	criterion	divan	zenbench
Execution model
Interleaved round-robin	❌	❌	✅
Auto-convergence (stop when precise)	❌	❌	✅
Resource gating (detect other benchmarks)	❌	❌	✅
Statistics
Bootstrap confidence intervals	✅	❌	✅
Paired comparison test	Welch t	❌	Wilcoxon
Effect size metric	❌	❌	Cohen’s d
Drift detection (thermal/load)	❌	❌	Spearman r
Noise threshold (suppress trivial diffs)	✅ fixed 1%	❌	✅ configurable
Measurement
Hardware TSC timer (rdtsc/cntvct)	❌	✅ opt-in	✅ auto
Overhead compensation	slope regression	loop subtraction	loop subtraction
Stack alignment jitter	✅ alloca (unsafe)	❌	✅ safe trampoline
Deferred drop (exclude Drop from timing)	❌	✅ MaybeUninit	✅ Vec collect
Allocation profiling (GlobalAlloc)	❌	✅	✅
CI / Workflow
Save/load baselines	❌	❌	✅ `--baseline=`
Regression exit codes (0/1/2)	❌	❌	✅
Auto-update baseline on pass	❌	❌	✅ `--update-on-pass`
Multi-pass aggregation (in-process)	❌	❌	✅ `--best-of-passes=N`
Multi-process aggregation (cross-ASLR)	❌	❌	✅ `--best-of-processes=N`
Hardware fingerprint / testbed ID	❌	❌	✅
Cross-run variance inflation	❌	❌	✅ pooled t-test
Output
Terminal report	table	tree	tree (default) + table
Bar chart	❌	❌	✅ sorted, throughput
JSON / CSV / Markdown	✅ JSON	❌	✅ JSON + CSV + LLM + MD
HTML plots (violin/PDF/regression)	✅ plotters.rs	❌	❌
HTML report (self-contained, SVG)	❌	❌	✅ `--format=html`
Streaming per-group	❌	❌	✅
Adaptive column layout	❌	❌	✅ terminal-width aware
API
Async benchmarks	✅ to_async()	❌	✅ iter_async()
Thread contention testing	❌	✅ threads attr	✅ bench_contended()
Thread scaling analysis	❌	❌	✅ bench_scaling()
Drop-in criterion migration	—	❌	✅ zero code changes
Attribute macros	❌	✅ `#[divan::bench]`	❌
Platform
Linux x86_64 / aarch64	✅	✅	✅
Windows x86_64 / ARM64	✅	✅	✅
macOS ARM64 / Intel	✅	✅	✅

§Quick start

# Cargo.toml
[dev-dependencies]
zenbench = "0.1"

[[bench]]
name = "my_bench"
harness = false

use zenbench::prelude::*;

fn bench_sort(suite: &mut Suite) {
    suite.group("sort", |g| {
        g.throughput(Throughput::Elements(1000));

        g.bench("std_sort", |b| {
            b.with_input(|| (0..1000).rev().collect::<Vec<i32>>())
                .run(|mut v| { v.sort(); v })
        });

        g.bench("sort_unstable", |b| {
            b.with_input(|| (0..1000).rev().collect::<Vec<i32>>())
                .run(|mut v| { v.sort_unstable(); v })
        });
    });
}

zenbench::main!(bench_sort);

§CI regression testing

# After merging to main — save a baseline
cargo bench -- --save-baseline=main

# On PRs — check for regressions (exits 1 if > 5% slower)
cargo bench -- --baseline=main

# Auto-update baseline on clean runs
cargo bench -- --baseline=main --update-on-pass --max-regression=5

  Baseline comparison
  ───────────────────
  compress::level_1     16.2µs →   16.4µs    +1.2%    unchanged
  compress::level_6     15.1µs →   15.3µs    +1.3%    unchanged
  compress::level_9     15.0µs →   15.6µs    +4.0%    unchanged
  compress::mixed      401.0µs →  412.3µs    +2.8%    unchanged
  decompress::zenflate  91.5µs →   92.7µs    +1.3%    unchanged

  Summary: 0 regressions, 0 improvements, 5 unchanged

[zenbench] PASS: no regressions exceed 5% threshold

Full CI guide with GitHub Actions workflows: REGRESSION-TESTING.md

§Thread scaling

suite.group("scaling", |g| {
    g.throughput(Throughput::Elements(10_000));
    g.bench_scaling("work", |b, _tid| {
        b.iter(|| expensive_computation())
    });
});

  scaling  200 rounds × 77 calls
                    mean ±mad µs  95% CI vs base    items/s
  ├─ sqrt_1t        4.2 ±0.1µs  [4.2–4.3]µs       2.37G
  ├─ sqrt_2t        4.7 ±0.1µs  [+10.7%–+12.6%]   2.12G
  ├─ sqrt_4t        5.8 ±0.1µs  [+36.0%–+38.8%]   1.72G
  ├─ sqrt_8t        8.5 ±0.3µs  [+91.6%–+101%]    1.17G
  ╰─ sqrt_16t      14.2 ±0.3µs  [+232%–+245%]      703M

  sqrt_1t   ██████████████████████████████████████████████████ 2.37G
  sqrt_2t   █████████████████████████████████████████████ 2.12G
  sqrt_4t   ████████████████████████████████████ 1.72G
  sqrt_8t   █████████████████████████ 1.17G
  sqrt_16t  ███████████████ 703M

§Subgroups and organization

suite.group("dispatch", |g| {
    g.throughput(Throughput::Elements(100));
    g.throughput_unit("checks");

    g.subgroup("Generic (monomorphized)");
    g.bench("impl Stop (Stopper)", |b| b.iter(|| check_stopper()));
    g.bench("impl Stop (FnStop)", |b| b.iter(|| check_fn()));

    g.subgroup("Dynamic dispatch");
    g.bench("&dyn Stop", |b| b.iter(|| check_dyn()));
    g.bench("StopToken", |b| b.iter(|| check_token()));

    g.baseline("impl Stop (Stopper)");
    g.sort_by_speed();
});

  dispatch  200 rounds × 10K calls
                                mean ±mad ns  95% CI vs base     checks/s
  ├─ Generic (monomorphized)
  │  ├─ impl Stop (FnStop)      19.7 ±0.3ns  [-49.1%–-47.2%]      5.08G
  │  ╰─ impl Stop (Stopper)     38.5 ±0.5ns  [37.9–39.1]ns        2.60G
  ╰─ Dynamic dispatch
     ├─ StopToken                97.2 ±1.2ns  [+148%–+154%]        1.03G
     ╰─ &dyn Stop              112.5 ±3.1ns  [+176%–+193%]         889M

  impl Stop (FnStop)   ██████████████████████████████████████████████ 5.08G
  impl Stop (Stopper)  █████████████████████████████ 2.60G
  StopToken            ████████████ 1.03G
  &dyn Stop            ██████████ 889M

§Migrating from criterion

Add zenbench alongside criterion — migrate one file at a time:

[dev-dependencies]
criterion = "0.8"                                          # keep
zenbench = { version = "0.1", features = ["criterion-compat"] }  # add

Change one import per file — zero code changes to benchmark functions:

// Before:
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId, Throughput};

// After:
use zenbench::criterion_compat::*;
use zenbench::{criterion_group, criterion_main};

Closures can borrow local data — no move or Clone needed. Your existing criterion_group!, criterion_main!, bench_function, bench_with_input, BenchmarkId, Throughput, group.sample_size(), group.measurement_time(), and group.finish() all work unchanged.

Full upgrade ladder: MIGRATION.md

§Output formats

cargo bench                           # tree display (default, stderr)
cargo bench -- --style=table          # bordered tables with min column
cargo bench -- --format=json          # structured JSON (stdout)
cargo bench -- --format=csv           # spreadsheet-friendly (stdout)
cargo bench -- --format=llm           # key=value for AI tools (stdout)
cargo bench -- --format=md            # markdown tables (stdout)

§Multi-pass and multi-process

# In-process passes: resets calibration, warmup, heap addresses
cargo bench -- --best-of-passes=3
cargo bench -- --mean-of-passes=5

# Cross-OS-process: also resets ASLR, CPU freq, scheduler, page cache
cargo bench -- --best-of-processes=3
cargo bench -- --median-of-processes=5

# Composable: 3 processes × 2 passes = 6 total runs
cargo bench -- --best-of-processes=3 --best-of-passes=2

Policies: best (min mean — use on noisy hosts), mean (unbiased average), median (robust to outliers). See docs/multi_process.md for which noise sources each level attacks.

§API reference

use zenbench::prelude::*;

// Interleaved comparison group
suite.group("name", |g| {
    g.throughput(Throughput::Bytes(1024));
    g.subgroup("variant");
    g.bench("impl", |b| b.iter(|| work()));
    g.bench("with_setup", |b| {
        b.with_input(|| make_data()).run(|data| process(data))
    });
    g.bench("deferred_drop", |b| {
        b.iter_deferred_drop(|| Vec::<u8>::with_capacity(1024))
    });
});

// Single function shorthand
suite.bench_fn("fibonacci", || fib(20));

// Thread contention
g.bench_contended("mutex", 4, || Mutex::new(Map::new()), |b, m, tid| {
    b.iter(|| { m.lock().unwrap().insert(tid, 42); })
});

// Automatic thread scaling (probes 1..num_cpus)
g.bench_scaling("work", |b, _tid| b.iter(|| compute()));

§Configuration

group.config()
    .max_rounds(200)              // default 200
    .noise_threshold(0.02)        // ±2% significance gate
    .bootstrap_resamples(100_000) // CI precision (default 10K)
    .linear_sampling(true)        // slope regression for sub-100ns
    .cold_start(true)             // 1 iter + cache firewall
    .stack_jitter(true)           // random alignment (default on)
    .sort_by_speed(true);         // fastest first in report

§Platform support

Tested on all targets via GitHub Actions CI:

Platform	Timer	Notes
Linux x86_64	TSC (rdtsc)	Full support
Linux aarch64	Counter (cntvct_el0)	Full support
Windows x86_64	TSC (rdtsc)	Full support
Windows ARM64	Instant (~300ns)	No hardware counter in user mode
macOS ARM64	Counter (cntvct_el0)	Full support
macOS Intel	TSC (rdtsc)	Full support

§Image tech I maintain


State of the art codecs*	zenjpeg · zenpng · zenwebp · zengif · zenavif (rav1d-safe · zenrav1e · zenavif-parse · zenavif-serialize) · zenjxl (jxl-encoder · zenjxl-decoder) · zentiff · zenbitmaps · heic · zenraw · zenpdf · ultrahdr · mozjpeg-rs · webpx
Compression	zenflate · zenzop
Processing	zenresize · zenfilters · zenquant · zenblend
Metrics	zensim · fast-ssim2 · butteraugli · resamplescope-rs · codec-eval · codec-corpus
Pixel types & color	zenpixels · zenpixels-convert · linear-srgb · garb
Pipeline	zenpipe · zencodec · zencodecs · zenlayout · zennode
ImageResizer	ImageResizer (C#) — 24M+ NuGet downloads across all packages
Imageflow	Image optimization engine (Rust) — .NET · node · go — 9M+ NuGet downloads across all packages
Imageflow Server	The fast, safe image server (Rust+C#) — 552K+ NuGet downloads, deployed by Fortune 500s and major brands

_{* as of 2026}

§General Rust awesomeness

archmage · magetypes · enough · whereat · zenbench · cargo-copter

And other projects · GitHub @imazen · GitHub @lilith · lib.rs/~lilith · NuGet (over 30 million downloads / 87 packages)

§License

MIT OR Apache-2.0

Re-exports§

pub use platform::Testbed;

Modules§

baseline: Baseline persistence for CI regression testing.
calibration: Built-in calibration workloads for cross-machine normalization.
daemon: Fire-and-forget benchmark daemon.
exclusive: Cross-process exclusive benchmarking lock with heartbeat-based diagnostics.
mcp: MCP (Model Context Protocol) server for benchmark management.
platform
prelude: Prelude for convenient imports.
quickchart: QuickChart URL generation for embedding benchmark charts in READMEs.

Macros§

main

Structs§

AllocProfiler: Allocation profiler that wraps a GlobalAlloc to track heap usage.
AllocStats: Allocation statistics for a benchmark, averaged per iteration.
BenchGroup: A group of benchmarks to compare via interleaved execution.
Bencher: Controls the measurement of a single benchmark iteration.
BenchmarkResult: Result of a single benchmark within a group.
ComparisonResult: Result of a comparison group (multiple interleaved benchmarks).
GateConfig: Configuration for resource gating.
GroupConfig: Configuration for a benchmark group’s execution.
MeanCi: Bootstrap confidence interval for a single benchmark’s mean.
PairedAnalysis: Result of paired statistical analysis between two interleaved benchmarks.
RunId: Unique identifier for a benchmark run.
Suite: A complete benchmark suite containing comparison groups.
SuiteResult: Complete results of a benchmark suite run.
Summary: Streaming statistical summary using Welford’s online algorithm.

Enums§

Aggregation: How to combine N SuiteResults into one.
Throughput: Throughput declaration for a benchmark group.

Functions§

aggregate_results: Combine N SuiteResults into one under the caller-chosen policy.
black_box: Re-export black_box from std for convenience.
format_ns: Format nanoseconds as human-readable time.
run: Run a benchmark suite with default configuration.
run_and_save: Run a benchmark suite and save results to a JSON file.
run_gated: Run a benchmark suite with custom gate configuration.
run_passes: Run a benchmark suite in N sequential “passes” inside the current OS process and combine the per-pass results under the chosen Aggregation policy.
run_processes: Re-exec the current benchmark binary N times in separate OS processes and aggregate the results.

Crate zenbench

Crate zenbench Copy item path

§zenbench

§Why zenbench

§vs criterion and divan

§Quick start

§CI regression testing

§Thread scaling

§Subgroups and organization

§Migrating from criterion

§Output formats

§Multi-pass and multi-process

§API reference

§Configuration

§Platform support

§Image tech I maintain

§General Rust awesomeness

§License

Re-exports§

Modules§

Macros§

Structs§

Enums§

Functions§

Crate zenbench