zer-bench 1.0.5

Benchmark harness for zer: throughput, accuracy, and competitor-library comparison
zer-bench-1.0.5 is not a library.

zer-bench

Benchmark harness for zer, measures throughput, accuracy, and head-to-head comparisons against competitor libraries on Dutch administrative datasets.

Install

cargo install zer-bench

For GPU-accelerated benchmarks pass a feature flag at install time:

cargo install zer-bench --features avx2     # x86-64 AVX2 SIMD
cargo install zer-bench --features cuda     # NVIDIA CUDA (requires CUDA Toolkit 13.1+)
cargo install zer-bench --features vulkan   # Vulkan 1.3 compute

Datasets and models

zer-bench resolves dataset and model paths from environment variables. Download the benchmark datasets from HuggingFace and point ZER_DATASET_DIR at the local copy:

hf download arsalan-anwari/dutch-law-enforcement-entity-resolution-dataset \
    --repo-type dataset --local-dir ~/datasets/zer
export ZER_DATASET_DIR=~/datasets/zer

Neural judge benchmarks (--judge) also need model files:

hf download arsalan-anwari/zjudge --local-dir ~/.cache/zer/models
# ZER_MODEL_DIR defaults to ~/.cache/zer/models; override if needed

Environment variables

Variable Default Description
ZER_DATASET_DIR <workspace>/data Root directory for benchmark datasets downloaded from HuggingFace. Dataset paths are resolved as $ZER_DATASET_DIR/benchmarks/<scenario>/.... When unset, falls back to <workspace>/data (repo clone layout).
ZER_MODEL_DIR ~/.cache/zer/models Directory containing neural judge ONNX model files. Mirrors the layout from arsalan-anwari/zjudge on HuggingFace.
ZER_EXTERNAL_BENCHMARKS_DIR <workspace>/benchmarks Root directory containing external library benchmark scripts. Scripts are resolved as $ZER_EXTERNAL_BENCHMARKS_DIR/<library>/<mode>/run.py (or run.R). Set this when running zer-bench library outside of a zer repository clone. Can also be passed as --external-benchmarks-dir.

Subcommands

Subcommand Description
throughput Raw compare/EM/score throughput on a single dataset
accuracy Precision, recall, F1, and PR-AUC against labelled ground truth
library Run a competitor library script and capture its summary CSV
library-all Run all configured competitor libraries for a given mode and dataset
compare Read multiple CSV summaries and print a side-by-side comparison table

Quick start

# List available scenarios
zer-bench accuracy --list-scenarios

# Accuracy; results written to bench_results/ by default
zer-bench accuracy --scenario brp/dedupe --out bench_results/

# Accuracy with neural judge (replace cuda with tensorrt / rocm / directml / openvino)
zer-bench accuracy --scenario brp/dedupe --judge-target cuda --out bench_results/

# Throughput, CPU; switch --target to avx2 / cuda / vulkan for GPU backends
zer-bench throughput --scenario brp/dedupe --out bench_results/
zer-bench throughput --scenario brp/dedupe --target cuda --out bench_results/

# Run an external library (Splink) on the same scenario
zer-bench library --library splink --scenario brp/dedupe --out bench_results/

# Same, but scripts live outside a zer repo clone
zer-bench library --library splink --scenario brp/dedupe \
    --external-benchmarks-dir /path/to/my/benchmarks --out bench_results/

# Side-by-side comparison table
zer-bench compare --results bench_results/

Feature flags

Compute backends

Flag Description
cpu Scalar CPU fallback (always available without this flag too)
avx2 x86_64 AVX2 SIMD (~4× throughput vs scalar CPU)
cuda NVIDIA CUDA (SM 8.6+), requires CUDA Toolkit 13.1+
vulkan Vulkan 1.3 compute, requires Vulkan 1.3 driver

Neural judge execution providers

Install the feature flag that matches the hardware you want to use, then pass --judge-target <value> at runtime to select the execution provider:

Feature flag --judge-target value Description
(none) cpu CPU ONNX Runtime (default when --judge-target is omitted)
judge_cuda cuda NVIDIA CUDA ORT execution provider
judge_tensorrt tensorrt TensorRT FP16 with engine caching, requires TensorRT 8.0+
judge_rocm rocm AMD ROCm ORT execution provider
judge_directml directml Windows DirectML ORT execution provider
judge_openvino openvino Intel OpenVINO ORT execution provider

Example:

# Install with TensorRT support
cargo install zer-bench --features judge_tensorrt

# Run accuracy with TensorRT judge (--judge-target enables the judge automatically)
zer-bench accuracy --scenario brp/dedupe --judge-target tensorrt

Diagnostics

Flag Description
progress Print pipeline stage progress during accuracy runs
perf-metrics Print per-phase timing metrics (blocking_ms, compare_ms, etc.)
collect-pairs Collect all scored pairs after judging for unbiased PR-AUC
nvtx Map tracing spans to Nsight Systems ranges (profiling only)

Output format

Every accuracy, throughput, and library run appends a CSV row to the --out directory:

library,dataset,mode,precision,recall,f1,pr_auc,elapsed_ms,peak_memory_mb
zer,brp_persons,deduplicate,0.984,0.982,0.983,0.991,3653,163

Use zer-bench compare to aggregate rows from multiple runs into a formatted side-by-side table.

License

Apache-2.0 · GitHub