hyperloglockless 0.3.1

High-performance HyperLogLog with bias correction and full concurrency support.
Documentation

hyperloglockless

Crates.io docs.rs License: MIT License: APACHE Downloads

High-performance HyperLogLog with bias correction and full concurrency support. Used for accurate and space-efficient cardinality estimation.

Overview

HyperLogLogs are space efficient data structures for the "count-distinct problem", approximating the number of distinct elements in a multiset. Paper.

hyperloglockless offers a lockless concurrent HyperLogLog and a single threaded counterpart. They're simpler, faster, and more accurate than other HyperLogLog implementations:

  • ๐Ÿงต Concurrent: AtomicHyperLogLog is a drop-in replacement for RwLock<OtherHyperLogLog>: all methods take &self, so you can wrap it in Arc and update it concurrently without &mut.
  • โšก Fast: Designed to be fast and simple in both single and multi-threaded scenarios.
  • ๐ŸŽฏ Accurate: Empirically verified accuracy for trillions of elements; other implementations break down after millions.
  • โœ… Tested: Rigorously tested with loom and benchmarked.

Usage

[dependencies]
hyperloglockless = "0.3.0"

A HyperLogLog with precision p uses 2^p bytes of memory and has an error % of roughly 104 / sqrt(2^p).

use hyperloglockless::HyperLogLog;

let precision = hyperloglockless::precision_for_error(0.01); // 1% error
assert_eq!(precision, 14);

let mut hll = HyperLogLog::new(precision);
hll.insert(&'๐Ÿฆ€');
hll.insert_all('a'..='z');

let count = hll.count(); // ~27
assert_eq!(hll.len(), 1 << precision); // 16384 bytes

Full concurrency support: AtomicHyperLogLog is a drop-in replacement for RwLock<OtherHyperLogLog>: all methods take &self.

use hyperloglockless::AtomicHyperLogLog;

let hll = AtomicHyperLogLog::new(14);
hll.insert(&'๐Ÿฆ€');
hll.insert_all('a'..='z');

Single-Threaded Performance vs Others

Both hyperloglockless::HyperLogLog and hyperloglockless::AtomicHyperLogLog are extremely fast for both insert and count calls.

fp-micro fp-micro

Multi-Threaded Performance vs Others

hyperloglockless::AtomicHyperLogLog does not require any locking and therefore avoids thread contention.

fp-micro

Accuracy vs Others

hyperloglockless stays consistently accurate while other implementations break down after millions of items.

fp-micro

Available Features

  • rand - Enabled by default, this has the DefaultHasher source its random state using thread_rng() instead of hardware sources. Getting entropy from a user-space source is considerably faster, but requires additional dependencies to achieve this. Disabling this feature by using default-features = false makes DefaultHasher source its entropy using getrandom, which will have a much simpler code footprint at the expense of speed.
  • serde - HyperLogLogs implement Serialize and Deserialize when possible.
  • loom - AtomicHyperLogLogs use loom atomics, making it compatible with loom testing.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.