zenbench

Interleaved microbenchmarking for Rust with paired statistics, CI regression testing, and hardware-adaptive measurement.
Documentation · Example HTML Report · Tutorial
compress_64k 200 rounds × 67 calls
mean ±mad µs 95% CI vs base iB/s
├─ sequential
│ ├─ level_1 16.2 ±0.5µs [15.8–16.6]µs 3.78G
│ ├─ level_6 15.1 ±0.5µs [-4.7%–-3.5%] 4.05G
│ ╰─ level_9 15.0 ±0.5µs [-5.5%–-4.2%] 4.06G
╰─ patterns
├─ sequential 15.1 ±0.5µs [-5.8%–-4.4%] 4.03G
╰─ mixed 401.0 ±8.1µs [+2370%–+2385%] 156M
level_9 ██████████████████████████████████████████████ 4.06 GiB/s
level_6 ██████████████████████████████████████████████ 4.05 GiB/s
sequential █████████████████████████████████████████████ 4.03 GiB/s
level_1 ███████████████████████████████████████████ 3.78 GiB/s
mixed ██ 156 MiB/s
Why zenbench
Existing harnesses run benchmarks sequentially. Benchmark A runs on a hot CPU; benchmark B runs on an even hotter CPU with degraded turbo boost. System load changes between runs corrupt results.
Zenbench interleaves: each round, all benchmarks run in shuffled order. Round N of A and round N of B execute under identical conditions. Paired statistics on the round-by-round differences detect real changes — not thermal drift.
vs criterion and divan
| Feature | criterion | divan | zenbench |
|---|---|---|---|
| Execution model | |||
| Interleaved round-robin | ❌ | ❌ | ✅ |
| Auto-convergence (stop when precise) | ❌ | ❌ | ✅ |
| Resource gating (detect other benchmarks) | ❌ | ❌ | ✅ |
| Statistics | |||
| Bootstrap confidence intervals | ✅ | ❌ | ✅ |
| Paired comparison test | Welch t | ❌ | Wilcoxon |
| Effect size metric | ❌ | ❌ | Cohen's d |
| Drift detection (thermal/load) | ❌ | ❌ | Spearman r |
| Noise threshold (suppress trivial diffs) | ✅ fixed 1% | ❌ | ✅ configurable |
| Measurement | |||
| Hardware TSC timer (rdtsc/cntvct) | ❌ | ✅ opt-in | ✅ auto |
| Overhead compensation | slope regression | loop subtraction | loop subtraction |
| Stack alignment jitter | ✅ alloca (unsafe) | ❌ | ✅ safe trampoline |
| Deferred drop (exclude Drop from timing) | ❌ | ✅ MaybeUninit | ✅ Vec collect |
| Allocation profiling (GlobalAlloc) | ❌ | ✅ | ✅ |
| CI / Workflow | |||
| Save/load baselines | ❌ | ❌ | ✅ --baseline= |
| Regression exit codes (0/1/2) | ❌ | ❌ | ✅ |
| Auto-update baseline on pass | ❌ | ❌ | ✅ --update-on-pass |
| Hardware fingerprint / testbed ID | ❌ | ❌ | ✅ |
| Cross-run variance inflation | ❌ | ❌ | ✅ pooled t-test |
| Output | |||
| Terminal report | table | tree | tree (default) + table |
| Bar chart | ❌ | ❌ | ✅ sorted, throughput |
| JSON / CSV / Markdown | ✅ JSON | ❌ | ✅ JSON + CSV + LLM + MD |
| HTML plots (violin/PDF/regression) | ✅ plotters.rs | ❌ | ❌ |
| HTML report (self-contained, SVG) | ❌ | ❌ | ✅ --format=html |
| Streaming per-group | ❌ | ❌ | ✅ |
| Adaptive column layout | ❌ | ❌ | ✅ terminal-width aware |
| API | |||
| Async benchmarks | ✅ to_async() | ❌ | ✅ iter_async() |
| Thread contention testing | ❌ | ✅ threads attr | ✅ bench_contended() |
| Thread scaling analysis | ❌ | ❌ | ✅ bench_scaling() |
| Drop-in criterion migration | — | ❌ | ✅ zero code changes |
| Attribute macros | ❌ | ✅ #[divan::bench] |
❌ |
| Platform | |||
| Linux x86_64 / aarch64 | ✅ | ✅ | ✅ |
| Windows x86_64 / ARM64 | ✅ | ✅ | ✅ |
| macOS ARM64 / Intel | ✅ | ✅ | ✅ |
Quick start
# Cargo.toml
[]
= "0.1"
[[]]
= "my_bench"
= false
use *;
main!;
CI regression testing
# After merging to main — save a baseline
# On PRs — check for regressions (exits 1 if > 5% slower)
# Auto-update baseline on clean runs
Baseline comparison
───────────────────
compress::level_1 16.2µs → 16.4µs +1.2% unchanged
compress::level_6 15.1µs → 15.3µs +1.3% unchanged
compress::level_9 15.0µs → 15.6µs +4.0% unchanged
compress::mixed 401.0µs → 412.3µs +2.8% unchanged
decompress::zenflate 91.5µs → 92.7µs +1.3% unchanged
Summary: 0 regressions, 0 improvements, 5 unchanged
[zenbench] PASS: no regressions exceed 5% threshold
Full CI guide with GitHub Actions workflows: REGRESSION-TESTING.md
Thread scaling
suite.group;
scaling 200 rounds × 77 calls
mean ±mad µs 95% CI vs base items/s
├─ sqrt_1t 4.2 ±0.1µs [4.2–4.3]µs 2.37G
├─ sqrt_2t 4.7 ±0.1µs [+10.7%–+12.6%] 2.12G
├─ sqrt_4t 5.8 ±0.1µs [+36.0%–+38.8%] 1.72G
├─ sqrt_8t 8.5 ±0.3µs [+91.6%–+101%] 1.17G
╰─ sqrt_16t 14.2 ±0.3µs [+232%–+245%] 703M
sqrt_1t ██████████████████████████████████████████████████ 2.37G
sqrt_2t █████████████████████████████████████████████ 2.12G
sqrt_4t ████████████████████████████████████ 1.72G
sqrt_8t █████████████████████████ 1.17G
sqrt_16t ███████████████ 703M
Subgroups and organization
suite.group;
dispatch 200 rounds × 10K calls
mean ±mad ns 95% CI vs base checks/s
├─ Generic (monomorphized)
│ ├─ impl Stop (FnStop) 19.7 ±0.3ns [-49.1%–-47.2%] 5.08G
│ ╰─ impl Stop (Stopper) 38.5 ±0.5ns [37.9–39.1]ns 2.60G
╰─ Dynamic dispatch
├─ StopToken 97.2 ±1.2ns [+148%–+154%] 1.03G
╰─ &dyn Stop 112.5 ±3.1ns [+176%–+193%] 889M
impl Stop (FnStop) ██████████████████████████████████████████████ 5.08G
impl Stop (Stopper) █████████████████████████████ 2.60G
StopToken ████████████ 1.03G
&dyn Stop ██████████ 889M
Migrating from criterion
Add zenbench alongside criterion — migrate one file at a time:
[]
= "0.8" # keep
= { = "0.1", = ["criterion-compat"] } # add
Change one import per file — zero code changes to benchmark functions:
// Before:
use ;
// After:
use *;
use ;
Closures can borrow local data — no move or Clone needed. Your existing criterion_group!, criterion_main!, bench_function, bench_with_input, BenchmarkId, Throughput, group.sample_size(), group.measurement_time(), and group.finish() all work unchanged.
Full upgrade ladder: MIGRATION.md
Output formats
API reference
use *;
// Interleaved comparison group
suite.group;
// Single function shorthand
suite.bench_fn;
// Thread contention
g.bench_contended;
// Automatic thread scaling (probes 1..num_cpus)
g.bench_scaling;
Configuration
group.config
.max_rounds // default 200
.noise_threshold // ±2% significance gate
.bootstrap_resamples // CI precision (default 10K)
.linear_sampling // slope regression for sub-100ns
.cold_start // 1 iter + cache firewall
.stack_jitter // random alignment (default on)
.sort_by_speed; // fastest first in report
Platform support
Tested on all targets via GitHub Actions CI:
| Platform | Timer | Notes |
|---|---|---|
| Linux x86_64 | TSC (rdtsc) | Full support |
| Linux aarch64 | Counter (cntvct_el0) | Full support |
| Windows x86_64 | TSC (rdtsc) | Full support |
| Windows ARM64 | Instant (~300ns) | No hardware counter in user mode |
| macOS ARM64 | Counter (cntvct_el0) | Full support |
| macOS Intel | TSC (rdtsc) | Full support |
Image tech I maintain
| State of the art codecs* | zenjpeg · zenpng · zenwebp · zengif · zenavif (rav1d-safe · zenrav1e · zenavif-parse · zenavif-serialize) · zenjxl (jxl-encoder · zenjxl-decoder) · zentiff · zenbitmaps · heic · zenraw · zenpdf · ultrahdr · mozjpeg-rs · webpx |
| Compression | zenflate · zenzop |
| Processing | zenresize · zenfilters · zenquant · zenblend |
| Metrics | zensim · fast-ssim2 · butteraugli · resamplescope-rs · codec-eval · codec-corpus |
| Pixel types & color | zenpixels · zenpixels-convert · linear-srgb · garb |
| Pipeline | zenpipe · zencodec · zencodecs · zenlayout · zennode |
| ImageResizer | ImageResizer (C#) — 24M+ NuGet downloads across all packages |
| Imageflow | Image optimization engine (Rust) — .NET · node · go — 9M+ NuGet downloads across all packages |
| Imageflow Server | The fast, safe image server (Rust+C#) — 552K+ NuGet downloads, deployed by Fortune 500s and major brands |
* as of 2026
General Rust awesomeness
archmage · magetypes · enough · whereat · zenbench · cargo-copter
And other projects · GitHub @imazen · GitHub @lilith · lib.rs/~lilith · NuGet (over 30 million downloads / 87 packages)
License
MIT OR Apache-2.0