Skip to main content

Crate zenbench

Crate zenbench 

Source
Expand description

§zenbench CI crates.io lib.rs docs.rs license codecov

Interleaved microbenchmarking for Rust with paired statistics, CI regression testing, and hardware-adaptive measurement.

Documentation · Example HTML Report · Tutorial

  compress_64k  200 rounds × 67 calls
                       mean ±mad µs  95% CI vs base          iB/s
  ├─ sequential
  │  ├─ level_1        16.2 ±0.5µs  [15.8–16.6]µs          3.78G
  │  ├─ level_6        15.1 ±0.5µs  [-4.7%–-3.5%]          4.05G
  │  ╰─ level_9        15.0 ±0.5µs  [-5.5%–-4.2%]          4.06G
  ╰─ patterns
     ├─ sequential     15.1 ±0.5µs  [-5.8%–-4.4%]          4.03G
     ╰─ mixed         401.0 ±8.1µs  [+2370%–+2385%]         156M

  level_9       ██████████████████████████████████████████████ 4.06 GiB/s
  level_6       ██████████████████████████████████████████████ 4.05 GiB/s
  sequential    █████████████████████████████████████████████ 4.03 GiB/s
  level_1       ███████████████████████████████████████████ 3.78 GiB/s
  mixed         ██ 156 MiB/s

§Why zenbench

Existing harnesses run benchmarks sequentially. Benchmark A runs on a hot CPU; benchmark B runs on an even hotter CPU with degraded turbo boost. System load changes between runs corrupt results.

Zenbench interleaves: each round, all benchmarks run in shuffled order. Round N of A and round N of B execute under identical conditions. Paired statistics on the round-by-round differences detect real changes — not thermal drift.

§vs criterion and divan

Featurecriteriondivanzenbench
Execution model
Interleaved round-robin
Auto-convergence (stop when precise)
Resource gating (detect other benchmarks)
Statistics
Bootstrap confidence intervals
Paired comparison testWelch tWilcoxon
Effect size metricCohen’s d
Drift detection (thermal/load)Spearman r
Noise threshold (suppress trivial diffs)✅ fixed 1%✅ configurable
Measurement
Hardware TSC timer (rdtsc/cntvct)✅ opt-in✅ auto
Overhead compensationslope regressionloop subtractionloop subtraction
Stack alignment jitter✅ alloca (unsafe)✅ safe trampoline
Deferred drop (exclude Drop from timing)✅ MaybeUninit✅ Vec collect
Allocation profiling (GlobalAlloc)
CI / Workflow
Save/load baselines--baseline=
Regression exit codes (0/1/2)
Auto-update baseline on pass--update-on-pass
Hardware fingerprint / testbed ID
Cross-run variance inflation✅ pooled t-test
Output
Terminal reporttabletreetree (default) + table
Bar chart✅ sorted, throughput
JSON / CSV / Markdown✅ JSON✅ JSON + CSV + LLM + MD
HTML plots (violin/PDF/regression)✅ plotters.rs
HTML report (self-contained, SVG)--format=html
Streaming per-group
Adaptive column layout✅ terminal-width aware
API
Async benchmarks✅ to_async()✅ iter_async()
Thread contention testing✅ threads attr✅ bench_contended()
Thread scaling analysis✅ bench_scaling()
Drop-in criterion migration✅ zero code changes
Attribute macros#[divan::bench]
Platform
Linux x86_64 / aarch64
Windows x86_64 / ARM64
macOS ARM64 / Intel

§Quick start

# Cargo.toml
[dev-dependencies]
zenbench = "0.1"

[[bench]]
name = "my_bench"
harness = false
use zenbench::prelude::*;

fn bench_sort(suite: &mut Suite) {
    suite.group("sort", |g| {
        g.throughput(Throughput::Elements(1000));

        g.bench("std_sort", |b| {
            b.with_input(|| (0..1000).rev().collect::<Vec<i32>>())
                .run(|mut v| { v.sort(); v })
        });

        g.bench("sort_unstable", |b| {
            b.with_input(|| (0..1000).rev().collect::<Vec<i32>>())
                .run(|mut v| { v.sort_unstable(); v })
        });
    });
}

zenbench::main!(bench_sort);

§CI regression testing

# After merging to main — save a baseline
cargo bench -- --save-baseline=main

# On PRs — check for regressions (exits 1 if > 5% slower)
cargo bench -- --baseline=main

# Auto-update baseline on clean runs
cargo bench -- --baseline=main --update-on-pass --max-regression=5
  Baseline comparison
  ───────────────────
  compress::level_1     16.2µs →   16.4µs    +1.2%    unchanged
  compress::level_6     15.1µs →   15.3µs    +1.3%    unchanged
  compress::level_9     15.0µs →   15.6µs    +4.0%    unchanged
  compress::mixed      401.0µs →  412.3µs    +2.8%    unchanged
  decompress::zenflate  91.5µs →   92.7µs    +1.3%    unchanged

  Summary: 0 regressions, 0 improvements, 5 unchanged

[zenbench] PASS: no regressions exceed 5% threshold

Full CI guide with GitHub Actions workflows: REGRESSION-TESTING.md

§Thread scaling

suite.group("scaling", |g| {
    g.throughput(Throughput::Elements(10_000));
    g.bench_scaling("work", |b, _tid| {
        b.iter(|| expensive_computation())
    });
});
  scaling  200 rounds × 77 calls
                    mean ±mad µs  95% CI vs base    items/s
  ├─ sqrt_1t        4.2 ±0.1µs  [4.2–4.3]µs       2.37G
  ├─ sqrt_2t        4.7 ±0.1µs  [+10.7%–+12.6%]   2.12G
  ├─ sqrt_4t        5.8 ±0.1µs  [+36.0%–+38.8%]   1.72G
  ├─ sqrt_8t        8.5 ±0.3µs  [+91.6%–+101%]    1.17G
  ╰─ sqrt_16t      14.2 ±0.3µs  [+232%–+245%]      703M

  sqrt_1t   ██████████████████████████████████████████████████ 2.37G
  sqrt_2t   █████████████████████████████████████████████ 2.12G
  sqrt_4t   ████████████████████████████████████ 1.72G
  sqrt_8t   █████████████████████████ 1.17G
  sqrt_16t  ███████████████ 703M

§Subgroups and organization

suite.group("dispatch", |g| {
    g.throughput(Throughput::Elements(100));
    g.throughput_unit("checks");

    g.subgroup("Generic (monomorphized)");
    g.bench("impl Stop (Stopper)", |b| b.iter(|| check_stopper()));
    g.bench("impl Stop (FnStop)", |b| b.iter(|| check_fn()));

    g.subgroup("Dynamic dispatch");
    g.bench("&dyn Stop", |b| b.iter(|| check_dyn()));
    g.bench("StopToken", |b| b.iter(|| check_token()));

    g.baseline("impl Stop (Stopper)");
    g.sort_by_speed();
});
  dispatch  200 rounds × 10K calls
                                mean ±mad ns  95% CI vs base     checks/s
  ├─ Generic (monomorphized)
  │  ├─ impl Stop (FnStop)      19.7 ±0.3ns  [-49.1%–-47.2%]      5.08G
  │  ╰─ impl Stop (Stopper)     38.5 ±0.5ns  [37.9–39.1]ns        2.60G
  ╰─ Dynamic dispatch
     ├─ StopToken                97.2 ±1.2ns  [+148%–+154%]        1.03G
     ╰─ &dyn Stop              112.5 ±3.1ns  [+176%–+193%]         889M

  impl Stop (FnStop)   ██████████████████████████████████████████████ 5.08G
  impl Stop (Stopper)  █████████████████████████████ 2.60G
  StopToken            ████████████ 1.03G
  &dyn Stop            ██████████ 889M

§Migrating from criterion

Add zenbench alongside criterion — migrate one file at a time:

[dev-dependencies]
criterion = "0.8"                                          # keep
zenbench = { version = "0.1", features = ["criterion-compat"] }  # add

Change one import per file — zero code changes to benchmark functions:

// Before:
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId, Throughput};

// After:
use zenbench::criterion_compat::*;
use zenbench::{criterion_group, criterion_main};

Closures can borrow local data — no move or Clone needed. Your existing criterion_group!, criterion_main!, bench_function, bench_with_input, BenchmarkId, Throughput, group.sample_size(), group.measurement_time(), and group.finish() all work unchanged.

Full upgrade ladder: MIGRATION.md

§Output formats

cargo bench                           # tree display (default, stderr)
cargo bench -- --style=table          # bordered tables with min column
cargo bench -- --format=json          # structured JSON (stdout)
cargo bench -- --format=csv           # spreadsheet-friendly (stdout)
cargo bench -- --format=llm           # key=value for AI tools (stdout)
cargo bench -- --format=md            # markdown tables (stdout)

§API reference

use zenbench::prelude::*;

// Interleaved comparison group
suite.group("name", |g| {
    g.throughput(Throughput::Bytes(1024));
    g.subgroup("variant");
    g.bench("impl", |b| b.iter(|| work()));
    g.bench("with_setup", |b| {
        b.with_input(|| make_data()).run(|data| process(data))
    });
    g.bench("deferred_drop", |b| {
        b.iter_deferred_drop(|| Vec::<u8>::with_capacity(1024))
    });
});

// Single function shorthand
suite.bench_fn("fibonacci", || fib(20));

// Thread contention
g.bench_contended("mutex", 4, || Mutex::new(Map::new()), |b, m, tid| {
    b.iter(|| { m.lock().unwrap().insert(tid, 42); })
});

// Automatic thread scaling (probes 1..num_cpus)
g.bench_scaling("work", |b, _tid| b.iter(|| compute()));

§Configuration

group.config()
    .max_rounds(200)              // default 200
    .noise_threshold(0.02)        // ±2% significance gate
    .bootstrap_resamples(100_000) // CI precision (default 10K)
    .linear_sampling(true)        // slope regression for sub-100ns
    .cold_start(true)             // 1 iter + cache firewall
    .stack_jitter(true)           // random alignment (default on)
    .sort_by_speed(true);         // fastest first in report

§Platform support

Tested on all targets via GitHub Actions CI:

PlatformTimerNotes
Linux x86_64TSC (rdtsc)Full support
Linux aarch64Counter (cntvct_el0)Full support
Windows x86_64TSC (rdtsc)Full support
Windows ARM64Instant (~300ns)No hardware counter in user mode
macOS ARM64Counter (cntvct_el0)Full support
macOS IntelTSC (rdtsc)Full support

§Image tech I maintain

State of the art codecs*zenjpeg · zenpng · zenwebp · zengif · zenavif (rav1d-safe · zenrav1e · zenavif-parse · zenavif-serialize) · zenjxl (jxl-encoder · zenjxl-decoder) · zentiff · zenbitmaps · heic · zenraw · zenpdf · ultrahdr · mozjpeg-rs · webpx
Compressionzenflate · zenzop
Processingzenresize · zenfilters · zenquant · zenblend
Metricszensim · fast-ssim2 · butteraugli · resamplescope-rs · codec-eval · codec-corpus
Pixel types & colorzenpixels · zenpixels-convert · linear-srgb · garb
Pipelinezenpipe · zencodec · zencodecs · zenlayout · zennode
ImageResizerImageResizer (C#) — 24M+ NuGet downloads across all packages
ImageflowImage optimization engine (Rust) — .NET · node · go — 9M+ NuGet downloads across all packages
Imageflow ServerThe fast, safe image server (Rust+C#) — 552K+ NuGet downloads, deployed by Fortune 500s and major brands

* as of 2026

§General Rust awesomeness

archmage · magetypes · enough · whereat · zenbench · cargo-copter

And other projects · GitHub @imazen · GitHub @lilith · lib.rs/~lilith · NuGet (over 30 million downloads / 87 packages)

§License

MIT OR Apache-2.0

Re-exports§

pub use platform::Testbed;

Modules§

baseline
Baseline persistence for CI regression testing.
calibration
Built-in calibration workloads for cross-machine normalization.
daemon
Fire-and-forget benchmark daemon.
mcp
MCP (Model Context Protocol) server for benchmark management.
platform
prelude
Prelude for convenient imports.

Macros§

main
Macro for defining benchmark binaries with cargo bench.

Structs§

AllocProfiler
Allocation profiler that wraps a GlobalAlloc to track heap usage.
AllocStats
Allocation statistics for a benchmark, averaged per iteration.
BenchGroup
A group of benchmarks to compare via interleaved execution.
Bencher
Controls the measurement of a single benchmark iteration.
BenchmarkResult
Result of a single benchmark (standalone or within a group).
ComparisonResult
Result of a comparison group (multiple interleaved benchmarks).
GateConfig
Configuration for resource gating.
GroupConfig
Configuration for a benchmark group’s execution.
MeanCi
Bootstrap confidence interval for a single benchmark’s mean.
PairedAnalysis
Result of paired statistical analysis between two interleaved benchmarks.
RunId
Unique identifier for a benchmark run.
Suite
A complete benchmark suite containing comparison groups and standalone benchmarks.
SuiteResult
Complete results of a benchmark suite run.
Summary
Streaming statistical summary using Welford’s online algorithm.

Enums§

Throughput
Throughput declaration for a benchmark group.

Functions§

black_box
Re-export black_box from std for convenience.
format_ns
Format nanoseconds as human-readable time.
run
Run a benchmark suite with default configuration.
run_and_save
Run a benchmark suite and save results to a JSON file.
run_gated
Run a benchmark suite with custom gate configuration.