Expand description
§hyperloglockless
High-performance HyperLogLog with bias correction and full concurrency support. Used for accurate and space-efficient cardinality estimation.
§Overview
HyperLogLogs are space efficient data structures for the “count-distinct problem”, approximating the number of distinct elements in a multiset. Paper.
hyperloglockless offers a lockless concurrent HyperLogLog and a single threaded counterpart. They’re simpler, faster, and more accurate than other HyperLogLog implementations:
- 🧵 Concurrent:
AtomicHyperLogLog
is a drop-in replacement forRwLock<OtherHyperLogLog>
: all methods take&self
, so you can wrap it inArc
and update it concurrently without&mut
. - ⚡ Fast: Designed to be fast and simple in both single and multi-threaded scenarios.
- 🎯 Accurate: Empirically verified accuracy for trillions of elements; other implementations break down after millions.
- ✅ Tested: Rigorously tested with loom and benchmarked.
§Usage
[dependencies]
hyperloglockless = "0.3.0"
A HyperLogLog with precision p
uses 2^p
bytes of memory and has an error % of roughly 104 / sqrt(2^p)
.
use hyperloglockless::HyperLogLog;
let precision = hyperloglockless::precision_for_error(0.01); // 1% error
assert_eq!(precision, 14);
let mut hll = HyperLogLog::new(precision);
hll.insert(&'🦀');
hll.insert_all('a'..='z');
let count = hll.count(); // ~27
assert_eq!(hll.len(), 1 << precision); // 16384 bytes
Full concurrency support: AtomicHyperLogLog
is a drop-in replacement for RwLock<OtherHyperLogLog>
: all methods take &self
.
use hyperloglockless::AtomicHyperLogLog;
let hll = AtomicHyperLogLog::new(14);
hll.insert(&'🦀');
hll.insert_all('a'..='z');
§Single-Threaded Performance vs Others
Both hyperloglockless::HyperLogLog
and hyperloglockless::AtomicHyperLogLog
are extremely fast for both insert and count calls.
§Multi-Threaded Performance vs Others
hyperloglockless::AtomicHyperLogLog
does not require any locking and therefore avoids thread contention.
§Accuracy vs Others
hyperloglockless stays consistently accurate while other implementations break down after millions of items.
§Available Features
rand
- Enabled by default, this has theDefaultHasher
source its random state usingthread_rng()
instead of hardware sources. Getting entropy from a user-space source is considerably faster, but requires additional dependencies to achieve this. Disabling this feature by usingdefault-features = false
makesDefaultHasher
source its entropy usinggetrandom
, which will have a much simpler code footprint at the expense of speed.serde
- HyperLogLogs implementSerialize
andDeserialize
when possible.loom
-AtomicHyperLogLog
s use loom atomics, making it compatible with loom testing.
§License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
§Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Structs§
- Atomic
Hyper LogLog - HyperLogLog is a data structure for the “count-distinct problem”, approximating the number of distinct elements in a multiset.
AtomicHyperLogLog
is the thread-safe counterpart ofHyperLogLog
. - Hyper
LogLog - HyperLogLog is a data structure for the “count-distinct problem”, approximating the number of distinct elements in a multiset.
Enums§
Functions§
- error_
for_ precision - Returns the approximate error of
count
andraw_count
given the precision of aHyperLogLog
orAtomicHyperLogLog
. - precision_
for_ error - Returns the HyperLogLog precision that will have the error for calls to
count
andraw_count
.
Type Aliases§
- Default
Hasher - The default hasher for
crate::HyperLogLog
andcrate::AtomicHyperLogLog
.