Skip to main content

Module hyperloglog

Module hyperloglog 

Source
Expand description

HyperLogLog — probabilistic distinct-count (cardinality) estimator.

m = 2^p register bank; each add(x) hashes x into 64 bits, uses the top p bits as a register index and counts the leading zeros of the remaining 64 − p bits. Per-register state is max(zeros + 1) across every element routed there. Cardinality is recovered by a harmonic mean of 2^(-register) with an α_m bias correction, plus small-range linear counting when many registers are still empty.

Memory: m bytes (one u8 per register). Typical configs:

pmmemorystandard error
101 0241 KiB3.25 %
124 0964 KiB1.625 %
1416 38416 KiB0.81 %
1665 53664 KiB0.40 %

Gated behind the std feature because the hash path relies on std::hash::DefaultHasher (SipHash 1-3).

§References

  1. P. Flajolet, É. Fusy, O. Gandouet, F. Meunier, “HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm”, AofA 2007.
  2. S. Heule, M. Nunkesser, A. Hall, “HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm”, EDBT 2013.

Structs§

HyperLogLog
Probabilistic distinct-count sketch.

Constants§

DEFAULT_PRECISION
Default precision p = 12 — 4 096 registers, ≈ 1.625 % std error, ~4 KiB memory.
MAX_PRECISION
Maximum precision bit count — 65 536 registers.
MIN_PRECISION
Minimum precision bit count — 16 registers.