exaloglog
Space-efficient approximate distinct counting in pure Rust. Implements ExaLogLog (ELL) by Otmar Ertl (2024), achieving the same estimation error as HyperLogLog with 43% less memory.
Why
| Algorithm | Bytes for ~2% RMSE @ n=10⁶ |
|---|---|
HyperLogLog (6-bit, p=11) |
1792 |
UltraLogLog (p=10) |
1056 |
ExaLogLog (t=2, d=20, p=8) |
896 |
The MVP (memory-variance product) for HLL with 6-bit registers is 6.48,
versus 3.67 for ExaLogLog(t=2, d=20) and 3.78 for the 32-bit-aligned
ExaLogLogFast(t=2, d=24) — a 43% and 41% improvement, respectively.
Two variants
This crate ships two configurations of the algorithm. Both implement the same public API and both expose the maximum-likelihood and martingale (HIP) estimators.
ExaLogLog (default — packed d=20, MVP=3.67)
28-bit registers, packed two-per-7-bytes. 43% smaller than HLL at the same RMSE. Use this unless you need lock-free concurrent updates.
ExaLogLogFast (32-bit aligned, d=24, MVP=3.78)
32-bit registers, one register per u32. 41% smaller than HLL at the
same RMSE. Slightly faster scalar inserts (~15-30%) and the variant
that exposes lock-free concurrent ingest via add_hash_atomic(&self, hash).
Sparse mode
ExaLogLog::new(p) starts in sparse mode and stores 32-bit hash tokens
until reaching m · 7/8 distinct elements (the break-even point), then
promotes to dense. For low-cardinality sketches this means up to ~30×
less memory and exact distinct counts. Skip it with
ExaLogLog::new_dense(p) if you know n will be large.
Reducing precision
Both variants support reduce(new_p), returning a sketch at lower
precision following Algorithm 6 of the paper (restricted to keeping
d constant). The estimate from the reduced sketch matches a
directly-built sketch at new_p to within a few percent on the
property tests; an exact byte-for-byte parity test against the Java
reference for downsize is on the v0.16 list. Useful when you
committed to too high a p and need to compact existing sketches.
Usage
use ExaLogLog;
let mut sketch = new; // m = 2^12 = 4096 registers
// or, sized for a target accuracy:
// let mut sketch = ExaLogLog::with_target_rmse(0.02); // ~2% RMSE
for i in 0..1_000_000u64
let estimate = sketch.estimate;
println!;
estimate() defaults to the ML estimator (works after merges and from
deserialized sketches). On a freshly built dense sketch, the martingale
(HIP) estimator is also available via estimate_martingale() and has
slightly lower variance.
For high-throughput ingest, hash with a fast function and call
add_hash(u64) directly:
use xxh3_64;
sketch.add_hash; // ~1.6× faster than add(&T)
Or batch-insert and let the crate handle cache locality:
let hashes: = inputs.iter.map.collect;
sketch.add_hashes_sorted; // big win at p ≥ 14
Empirical accuracy
Run cargo run --release --example accuracy to reproduce. At p = 12
(packed, 14336 bytes) over 200 trials at each n:
| n | ML RMSE% | HIP RMSE% | ML bias% |
|---|---|---|---|
| 100 | 0.35 | 0.35 | +0.02 |
| 1k | 0.36 | 0.35 | +0.03 |
| 10k | 0.42 | 0.39 | -0.02 |
| 100k | 0.44 | 0.41 | +0.01 |
| 1M | 0.55 | 0.50 | +0.05 |
Theory predicts RMSE ≈ √(MVP / total_bits) = √(3.67 / 114688) ≈ 0.566%. Empirical numbers track theory and the ML estimator is essentially unbiased.
Throughput
Single-threaded, scalar implementation, on a recent Linux x86_64 machine (AMD Ryzen 9 9950X, rustc 1.95.0 stable):
| variant | p=8 |
p=12 |
p=16 |
|---|---|---|---|
ExaLogLog (packed) |
101 M ins/s | 46 M ins/s | 39 M ins/s |
ExaLogLogFast (aligned) |
146 M ins/s | 53 M ins/s | 45 M ins/s |
Concurrent ingest via add_hash_atomic scales to >400 M ins/s at 8
threads (cargo run --release --example concurrent_ingest).
A SIMD-accelerated batch insert path is on the roadmap.
Validation
In addition to ~50 unit and property tests, the crate ships a
byte-for-byte parity test against Dynatrace's Java reference:
register state after inserting splitmix64(0..n) is identical to the
authoritative implementation across d ∈ {20, 24}, p ∈ {4, 8, 12},
and n ∈ {100, 1000, 10000}. See tests/java_parity.rs and
notes/java-parity.md.
Optional features
serde: derivesSerialize/Deserializefor both sketch types, going through the existing byte format. Works with bincode, JSON, MessagePack, CBOR, and any other serde data format.rayon: enablesmerge_many_parandmerge_many_par_fast, which reduce a slice of sketches in parallel via Rayon's work-stealing thread pool. Useful for rolling up many tenant sketches when you have spare cores.simd: enables an x86_64 AVX2 / AVX-512 batch path for the hash →(register index, update value)computation used byadd_hashes_sorted. Unsafecore::archcalls are scoped to a single module and gated by runtime feature detection. The rest of the crate is#![deny(unsafe_code)]. Effectively raises the MSRV to 1.89 (when AVX-512 intrinsics stabilized) when the feature is enabled.
License
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contributing
Issues and PRs welcome. By contributing you agree to license your contribution under the same dual license.