pub struct HyperLogLog<T: ?Sized, S = DefaultHashBuilder> { /* private fields */ }alloc only.Expand description
Estimates the number of distinct items in a stream in fixed, tiny memory.
HyperLogLog counts unique items without storing them. It hashes each item,
uses the leading bits to pick one of 2^p registers, and records the longest
run of leading zeros seen in the remaining bits — a quantity that grows
logarithmically with the number of distinct items hitting that register.
Combining the registers with a bias-corrected harmonic mean yields a
cardinality estimate with a standard error of about 1.04 / sqrt(2^p).
Memory is 2^p bytes regardless of how many items are inserted, so a
precision-14 estimator counts billions of distinct items in 16 KiB with a
typical error under 1%.
The estimator is generic over the item type T and a
BuildHasher S, defaulting to the deterministic
DefaultHashBuilder.
§Examples
use bloom_lib::HyperLogLog;
let mut hll = HyperLogLog::new(14).unwrap();
for i in 0..100_000u32 {
hll.insert(&i);
}
// Within a couple of percent of the true cardinality of 100,000.
let estimate = hll.count();
assert!((97_000..=103_000).contains(&estimate));Implementations§
Source§impl<T: ?Sized> HyperLogLog<T, DefaultHashBuilder>
impl<T: ?Sized> HyperLogLog<T, DefaultHashBuilder>
Sourcepub fn new(precision: u8) -> Result<Self, Error>
pub fn new(precision: u8) -> Result<Self, Error>
Creates an estimator with the given precision, using the default
hasher.
The estimator uses 2^precision one-byte registers and has a standard
error of roughly 1.04 / sqrt(2^precision). Precision 14 (16 KiB, ~0.8%
error) is a common production choice.
§Parameters
precision: the log2 of the register count. Must be in4..=18.
§Errors
Returns Error::InvalidParameter if precision is outside 4..=18.
§Examples
use bloom_lib::HyperLogLog;
let hll = HyperLogLog::<&str>::new(12).unwrap();
assert_eq!(hll.precision(), 12);
assert!(hll.is_empty());Sourcepub fn with_error_rate(error: f64) -> Result<Self, Error>
pub fn with_error_rate(error: f64) -> Result<Self, Error>
Creates an estimator sized for a target relative error, using the
default hasher.
The precision is chosen as the smallest value whose standard error does
not exceed error, then clamped to the supported 4..=18 range.
§Parameters
error: the desired standard error, in(0.0, 1.0). For example0.01targets ~1% error.
§Errors
Returns Error::InvalidParameter if error is not a finite value in
(0.0, 1.0).
§Examples
use bloom_lib::HyperLogLog;
let hll = HyperLogLog::<&str>::with_error_rate(0.01).unwrap();
// ~1% error needs about 2^14 registers.
assert_eq!(hll.precision(), 14);Source§impl<T: ?Sized, S: BuildHasher> HyperLogLog<T, S>
impl<T: ?Sized, S: BuildHasher> HyperLogLog<T, S>
Sourcepub fn with_hasher(precision: u8, hasher: S) -> Result<Self, Error>
pub fn with_hasher(precision: u8, hasher: S) -> Result<Self, Error>
Creates an estimator with the given precision and a caller-supplied
hasher.
§Errors
Returns Error::InvalidParameter if precision is outside 4..=18.
§Examples
use std::collections::hash_map::RandomState;
use bloom_lib::HyperLogLog;
let hll: HyperLogLog<&str, RandomState> =
HyperLogLog::with_hasher(14, RandomState::new()).unwrap();Sourcepub fn insert(&mut self, item: &T)where
T: Hash,
pub fn insert(&mut self, item: &T)where
T: Hash,
Adds item to the estimator.
Inserting an item already counted has no effect on the estimate, so the operation is idempotent with respect to distinct values.
§Examples
use bloom_lib::HyperLogLog;
let mut hll = HyperLogLog::new(14).unwrap();
hll.insert("first");
hll.insert("first"); // no additional effect
hll.insert("second");
assert_eq!(hll.count(), 2);Sourcepub fn count(&self) -> u64
pub fn count(&self) -> u64
Returns the estimated number of distinct items inserted.
Uses the bias-corrected HyperLogLog estimator, falling back to linear counting at low cardinalities where the raw estimate is unreliable. The result is approximate, with the standard error implied by the precision.
§Examples
use bloom_lib::HyperLogLog;
let mut hll = HyperLogLog::new(14).unwrap();
assert_eq!(hll.count(), 0);
for i in 0..1_000u32 {
hll.insert(&i);
}
let estimate = hll.count();
assert!((960..=1_040).contains(&estimate));Sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Resets the estimator to empty, retaining the allocation.
§Examples
use bloom_lib::HyperLogLog;
let mut hll = HyperLogLog::new(14).unwrap();
hll.insert("x");
hll.clear();
assert!(hll.is_empty());
assert_eq!(hll.count(), 0);Sourcepub fn merge(&mut self, other: &Self) -> Result<(), Error>
pub fn merge(&mut self, other: &Self) -> Result<(), Error>
Merges other into self by taking the register-wise maximum.
The result estimates the cardinality of the union of the two streams. Both estimators must share the same precision.
§Errors
Returns Error::IncompatibleParameters if the precisions differ.
§Examples
use bloom_lib::HyperLogLog;
let mut a = HyperLogLog::new(14).unwrap();
let mut b = HyperLogLog::new(14).unwrap();
for i in 0..1_000u32 {
a.insert(&i);
}
for i in 500..1_500u32 {
b.insert(&i);
}
a.merge(&b).unwrap();
// Union of [0,1000) and [500,1500) is [0,1500): ~1,500 distinct.
let estimate = a.count();
assert!((1_400..=1_600).contains(&estimate));Trait Implementations§
Source§impl<T: Clone + ?Sized, S: Clone> Clone for HyperLogLog<T, S>
impl<T: Clone + ?Sized, S: Clone> Clone for HyperLogLog<T, S>
Source§fn clone(&self) -> HyperLogLog<T, S>
fn clone(&self) -> HyperLogLog<T, S>
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more