exaloglog

Space-efficient approximate distinct counting in pure Rust. Implements ExaLogLog (ELL) by Otmar Ertl (2024), achieving the same estimation error as HyperLogLog with 43% less memory.

Why

Algorithm	Bytes for ~2% RMSE @ n=10⁶
HyperLogLog (6-bit, `p=11`)	1792
UltraLogLog (`p=10`)	1056
ExaLogLog `(t=2, d=20, p=8)`	896

The MVP (memory-variance product) for HLL with 6-bit registers is 6.48, versus 3.67 for ExaLogLog(t=2, d=20) and 3.78 for the 32-bit-aligned ExaLogLogFast(t=2, d=24) — a 43% and 41% improvement, respectively.

Two variants

This crate ships two configurations of the algorithm. Both implement the same public API and both expose the maximum-likelihood and martingale (HIP) estimators.

`ExaLogLog` (default — packed `d=20`, MVP=3.67)

28-bit registers, packed two-per-7-bytes. 43% smaller than HLL at the same RMSE. Use this unless you need lock-free concurrent updates.

`ExaLogLogFast` (32-bit aligned, `d=24`, MVP=3.78)

32-bit registers, one register per u32. 41% smaller than HLL at the same RMSE. Slightly faster scalar inserts (~15-30%) and the variant that exposes lock-free concurrent ingest via add_hash_atomic(&self, hash).

Sparse mode

ExaLogLog::new(p) starts in sparse mode and stores 32-bit hash tokens until reaching m · 7/8 distinct elements (the break-even point), then promotes to dense. For low-cardinality sketches this means up to ~30× less memory and exact distinct counts. Skip it with ExaLogLog::new_dense(p) if you know n will be large.

Reducing precision

Both variants support reduce(new_p), returning a sketch at lower precision identical to one that was built directly at new_p. Useful when you committed to too high a p and need to compact existing sketches.

Usage

use exaloglog::ExaLogLog;

let mut sketch = ExaLogLog::new(12); // m = 2^12 = 4096 registers

for i in 0..1_000_000u64 {
    sketch.add(&i);
}

let estimate = sketch.estimate();
println!("estimated distinct: {estimate:.0}");

For deserialization or post-merge sketches, estimate() falls back to the ML estimator (which works from register state alone). On a freshly built sketch, estimate_martingale() is also available and has slightly lower variance.

Empirical accuracy

Run cargo run --release --example accuracy to reproduce. At p = 12 (packed, 14336 bytes) over 200 trials at each n:

n	ML RMSE%	HIP RMSE%	ML bias%
100	0.35	0.35	+0.02
1k	0.36	0.35	+0.03
10k	0.42	0.39	-0.02
100k	0.44	0.41	+0.01
1M	0.55	0.50	+0.05

Theory predicts RMSE ≈ √(MVP / total_bits) = √(3.67 / 114688) ≈ 0.566%. Empirical numbers track theory and the ML estimator is essentially unbiased.

Throughput

Single-threaded, scalar implementation, on a recent Linux x86_64 machine (AMD Ryzen 9 9950X, rustc 1.95.0 stable):

variant	`p=8`	`p=12`	`p=16`
`ExaLogLog` (packed)	101 M ins/s	46 M ins/s	39 M ins/s
`ExaLogLogFast` (aligned)	146 M ins/s	53 M ins/s	45 M ins/s

Concurrent ingest via add_hash_atomic scales to >400 M ins/s at 8 threads (cargo run --release --example concurrent_ingest).

A SIMD-accelerated batch insert path is on the roadmap.

Validation

In addition to ~50 unit and property tests, the crate ships a byte-for-byte parity test against Dynatrace's Java reference: register state after inserting splitmix64(0..n) is identical to the authoritative implementation across d ∈ {20, 24}, p ∈ {4, 8, 12}, and n ∈ {100, 1000, 10000}. See tests/java_parity.rs and notes/java-parity.md.

Optional features

serde: derives Serialize / Deserialize for both sketch types, going through the existing byte format. Works with bincode, JSON, MessagePack, CBOR, and any other serde data format.

License

Dual-licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option.

Contributing

Issues and PRs welcome. By contributing you agree to license your contribution under the same dual license.

exaloglog 0.9.0