Expand description
HyperLogLog — probabilistic distinct-count (cardinality)
estimator.
m = 2^p register bank; each add(x) hashes x into 64 bits,
uses the top p bits as a register index and counts the
leading zeros of the remaining 64 − p bits. Per-register
state is max(zeros + 1) across every element routed there.
Cardinality is recovered by a harmonic mean of 2^(-register)
with an α_m bias correction, plus small-range linear
counting when many registers are still empty.
Memory: m bytes (one u8 per register). Typical configs:
p | m | memory | standard error |
|---|---|---|---|
| 10 | 1 024 | 1 KiB | 3.25 % |
| 12 | 4 096 | 4 KiB | 1.625 % |
| 14 | 16 384 | 16 KiB | 0.81 % |
| 16 | 65 536 | 64 KiB | 0.40 % |
Gated behind the std feature because the hash path relies on
std::hash::DefaultHasher (SipHash 1-3).
§References
- P. Flajolet, É. Fusy, O. Gandouet, F. Meunier,
“
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm”,AofA2007. - S. Heule, M. Nunkesser, A. Hall, “
HyperLogLogin Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm”, EDBT 2013.
Structs§
- Hyper
LogLog - Probabilistic distinct-count sketch.
Constants§
- DEFAULT_
PRECISION - Default precision
p = 12— 4 096 registers, ≈ 1.625 % std error, ~4 KiB memory. - MAX_
PRECISION - Maximum precision bit count — 65 536 registers.
- MIN_
PRECISION - Minimum precision bit count — 16 registers.