zer-bench 1.1.0

Benchmark harness for zer: throughput, accuracy, and competitor-library comparison
zer-bench-1.1.0 is not a library.

zer-bench

Benchmark harness for zer, measures throughput, accuracy, and head-to-head comparisons against competitor libraries on Dutch administrative datasets.

Install

cargo install zer-bench

For GPU-accelerated benchmarks pass the matching feature flag(s) at install time:

cargo install zer-bench --features avx2            # x86-64 AVX2 SIMD
cargo install zer-bench --features cuda            # NVIDIA CUDA (requires CUDA Toolkit 13.1+)
cargo install zer-bench --features vulkan          # Vulkan 1.3 compute
cargo install zer-bench --features judge_tensorrt  # TensorRT judge

To install with every compute backend and judge provider enabled:

cargo install zer-bench --features \
    "cuda,avx2,vulkan,judge_cuda,judge_tensorrt,judge_rocm,judge_directml,judge_openvino"

Once installed, --target and --judge-target switch backends at runtime no rebuild needed.

Datasets and models

zer-bench resolves dataset and model paths from environment variables. Download the benchmark datasets from HuggingFace and point ZER_DATASET_DIR at the local copy:

hf download arsalan-anwari/dutch-law-enforcement-entity-resolution-dataset \
    --repo-type dataset --local-dir ~/datasets/zer
export ZER_DATASET_DIR=~/datasets/zer

Neural judge benchmarks (--judge-target) also need model files:

hf download arsalan-anwari/zjudge --local-dir ~/.cache/zer/models
# ZER_MODEL_DIR defaults to ~/.cache/zer/models; override if needed

External library benchmarks

Head-to-head comparisons with competitor libraries (--compare-libs) require the external benchmark scripts. Clone the repository and point ZER_EXTERNAL_BENCHMARKS_DIR at it:

git clone https://github.com/ZAL-Analytics/zer-external-libs-benchmarks \
    ~/benchmarks/zer-external-libs
export ZER_EXTERNAL_BENCHMARKS_DIR=~/benchmarks/zer-external-libs

Or pass the path directly at runtime:

zer-bench accuracy --scenario brp/dedupe --compare-libs splink \
    --external-benchmarks-dir ~/benchmarks/zer-external-libs

Environment variables

Variable Default Description
ZER_DATASET_DIR <workspace>/data Root directory for benchmark datasets. Paths resolve as $ZER_DATASET_DIR/benchmarks/<scenario>/.... Falls back to <workspace>/data when unset.
ZER_MODEL_DIR ~/.cache/zer/models Directory containing neural judge ONNX model files. Mirrors the layout from arsalan-anwari/zjudge on HuggingFace.
ZER_EXTERNAL_BENCHMARKS_DIR <workspace>/benchmarks Root directory for external library benchmark scripts. Scripts resolve as $ZER_EXTERNAL_BENCHMARKS_DIR/<library>/<mode>/run.py. Can also be passed as --external-benchmarks-dir.

Subcommands

Subcommand Description
accuracy Precision, recall, F1, and PR-AUC against labelled ground truth
throughput Raw compare/EM/score throughput on a single dataset
compare Read multiple CSV summaries and print a side-by-side comparison table
plot Generate plots from benchmark JSON files via plot_results.py

Competitor libraries are run inline via --compare-libs on both accuracy and throughput no separate subcommand needed.

Quick start

List available scenarios

zer-bench accuracy  --list-scenarios
zer-bench throughput --list-scenarios

Accuracy

# Single scenario datasets, mode, and ground truth wired up automatically
zer-bench accuracy --scenario brp/dedupe --out bench_results/

# All 8 full-size scenarios back-to-back
zer-bench accuracy --scenario all --out bench_results/

# zer vs splink (runs both, prints inline comparison table)
zer-bench accuracy --scenario brp/dedupe --compare-libs splink --out bench_results/

# zer vs splink across all scenarios
zer-bench accuracy --scenario all --compare-libs splink --out bench_results/

Judge dual-pass

When --judge-target is supplied, zer-bench automatically runs zer without judge then zer with judge, then prints a side-by-side comparison table. No extra flags needed.

# CPU judge dual-pass (no extra feature flag needed)
zer-bench accuracy --scenario brp/dedupe --judge-target cpu

# TensorRT judge (requires --features judge_tensorrt at build time)
zer-bench accuracy --scenario brp/dedupe --judge-target tensorrt

# TensorRT judge vs splink baseline 3 results per scenario: zer, zer+judge, splink
zer-bench accuracy --scenario brp/dedupe --judge-target tensorrt --compare-libs splink

# All 8 scenarios × (zer + zer+judge + splink) 24 runs total, one table per scenario
zer-bench accuracy --scenario all --judge-target tensorrt --compare-libs splink

Throughput

# CPU throughput (dedupe scenarios only)
zer-bench throughput --scenario brp/dedupe --out bench_results/

# CUDA throughput (requires --features cuda)
zer-bench throughput --scenario brp/dedupe --target cuda --out bench_results/

# All dedupe scenarios back-to-back (brp/dedupe and kvk/dedupe)
zer-bench throughput --scenario all --target cuda --out bench_results/

# zer vs splink throughput
zer-bench throughput --scenario brp/dedupe --compare-libs splink --out bench_results/

# CUDA throughput + TensorRT judge dual-pass
zer-bench throughput --scenario brp/dedupe --target cuda --judge-target tensorrt

Comparing existing results

# Print a table from all summary CSVs in a directory
zer-bench compare --results bench_results/

# Filter by mode and dataset
zer-bench compare --results bench_results/ --mode dedupe --dataset brp_persons

Plotting

zer-bench plot --input bench_results/data/<run>/

Helper script

scripts/run_benchmark.sh is a thin driver that selects the correct Cargo features based on --target and --judge-target (backends must be compiled in), generates a timestamped output directory, then forwards everything to zer-bench:

# Equivalent to the direct zer-bench calls above
./scripts/run_benchmark.sh --scenario brp/dedupe
./scripts/run_benchmark.sh --scenario all --compare-libs splink
./scripts/run_benchmark.sh --type throughput --scenario brp/dedupe --target cuda
./scripts/run_benchmark.sh --scenario brp/dedupe --judge-target tensorrt --compare-libs splink
./scripts/run_benchmark.sh --list

Use the script during development. For a pre-built all-features binary, call zer-bench directly.

Feature flags

Compute backends

Flag --target value Description
(none) cpu or auto Scalar CPU fallback, always available
avx2 avx2 x86-64 AVX2 SIMD (~4× vs scalar CPU)
cuda cuda NVIDIA CUDA, requires CUDA Toolkit 13.1+
vulkan vulkan Vulkan 1.3 compute

Neural judge execution providers

Feature flag --judge-target value Description
(none) cpu CPU ONNX Runtime, always available
judge_cuda cuda NVIDIA CUDA ORT execution provider
judge_tensorrt tensorrt TensorRT FP16 with engine caching, requires TensorRT 8.0+
judge_rocm rocm AMD ROCm ORT execution provider
judge_directml directml Windows DirectML ORT execution provider
judge_openvino openvino Intel OpenVINO ORT execution provider

Diagnostics

Flag Description
progress Print pipeline stage progress during accuracy runs
perf-metrics Print per-phase timing metrics (blocking_ms, compare_ms, etc.)
collect-pairs Collect all scored pairs after judging for unbiased PR-AUC (on by default)
nvtx Map tracing spans to Nsight Systems ranges (profiling only)

Available scenarios

Accuracy (zer-bench accuracy --list-scenarios)

Scenario Mode Description
brp/dedupe deduplicate BRP person deduplication
brp/link link-only BRP → external source linkage
brp/link_and_dedupe link-and-dedupe BRP simultaneous dedup + link
brp_kvk/link link-only BRP × KVK cross-schema linkage
brp_sis/link link-only BRP × SIS cross-schema linkage
brp_hks/link link-only BRP × HKS cross-schema linkage
brp_kvk_hks/link_and_dedupe link-and-dedupe BRP × KVK × HKS multi-source
kvk/dedupe deduplicate KVK business-register deduplication

--scenario all runs all 8. Micro/smoke-test variants are also listed by --list-scenarios.

Throughput (zer-bench throughput --list-scenarios)

Throughput only supports dedupe scenarios (brp/dedupe, kvk/dedupe). --scenario all runs both back-to-back.

Output format

Every accuracy and throughput run writes to --out:

File Description
<run_id>_summary.csv Single-row CSV consumed by zer-bench compare
<run_id>_benchmark.json Full metadata: metrics, timings, memory snapshots
<run_id>_scored_pairs.csv (score, is_match) pairs for PR curve plotting (accuracy only)

Use zer-bench compare --results <dir> to aggregate rows from multiple runs into a formatted table.

License

Apache-2.0 · GitHub