Skip to main content

HyperLogLog

Struct HyperLogLog 

Source
pub struct HyperLogLog<T: ?Sized, S = DefaultHashBuilder> { /* private fields */ }
Available on crate feature alloc only.
Expand description

Estimates the number of distinct items in a stream in fixed, tiny memory.

HyperLogLog counts unique items without storing them. It hashes each item, uses the leading bits to pick one of 2^p registers, and records the longest run of leading zeros seen in the remaining bits — a quantity that grows logarithmically with the number of distinct items hitting that register. Combining the registers with a bias-corrected harmonic mean yields a cardinality estimate with a standard error of about 1.04 / sqrt(2^p).

Memory is 2^p bytes regardless of how many items are inserted, so a precision-14 estimator counts billions of distinct items in 16 KiB with a typical error under 1%.

The estimator is generic over the item type T and a BuildHasher S, defaulting to the deterministic DefaultHashBuilder.

§Examples

use bloom_lib::HyperLogLog;

let mut hll = HyperLogLog::new(14).unwrap();
for i in 0..100_000u32 {
    hll.insert(&i);
}

// Within a couple of percent of the true cardinality of 100,000.
let estimate = hll.count();
assert!((97_000..=103_000).contains(&estimate));

Implementations§

Source§

impl<T: ?Sized> HyperLogLog<T, DefaultHashBuilder>

Source

pub fn new(precision: u8) -> Result<Self, Error>

Creates an estimator with the given precision, using the default hasher.

The estimator uses 2^precision one-byte registers and has a standard error of roughly 1.04 / sqrt(2^precision). Precision 14 (16 KiB, ~0.8% error) is a common production choice.

§Parameters
  • precision: the log2 of the register count. Must be in 4..=18.
§Errors

Returns Error::InvalidParameter if precision is outside 4..=18.

§Examples
use bloom_lib::HyperLogLog;

let hll = HyperLogLog::<&str>::new(12).unwrap();
assert_eq!(hll.precision(), 12);
assert!(hll.is_empty());
Source

pub fn with_error_rate(error: f64) -> Result<Self, Error>

Creates an estimator sized for a target relative error, using the default hasher.

The precision is chosen as the smallest value whose standard error does not exceed error, then clamped to the supported 4..=18 range.

§Parameters
  • error: the desired standard error, in (0.0, 1.0). For example 0.01 targets ~1% error.
§Errors

Returns Error::InvalidParameter if error is not a finite value in (0.0, 1.0).

§Examples
use bloom_lib::HyperLogLog;

let hll = HyperLogLog::<&str>::with_error_rate(0.01).unwrap();
// ~1% error needs about 2^14 registers.
assert_eq!(hll.precision(), 14);
Source§

impl<T: ?Sized, S: BuildHasher> HyperLogLog<T, S>

Source

pub fn with_hasher(precision: u8, hasher: S) -> Result<Self, Error>

Creates an estimator with the given precision and a caller-supplied hasher.

§Errors

Returns Error::InvalidParameter if precision is outside 4..=18.

§Examples
use std::collections::hash_map::RandomState;
use bloom_lib::HyperLogLog;

let hll: HyperLogLog<&str, RandomState> =
    HyperLogLog::with_hasher(14, RandomState::new()).unwrap();
Source

pub fn insert(&mut self, item: &T)
where T: Hash,

Adds item to the estimator.

Inserting an item already counted has no effect on the estimate, so the operation is idempotent with respect to distinct values.

§Examples
use bloom_lib::HyperLogLog;

let mut hll = HyperLogLog::new(14).unwrap();
hll.insert("first");
hll.insert("first"); // no additional effect
hll.insert("second");
assert_eq!(hll.count(), 2);
Source

pub fn count(&self) -> u64

Returns the estimated number of distinct items inserted.

Uses the bias-corrected HyperLogLog estimator, falling back to linear counting at low cardinalities where the raw estimate is unreliable. The result is approximate, with the standard error implied by the precision.

§Examples
use bloom_lib::HyperLogLog;

let mut hll = HyperLogLog::new(14).unwrap();
assert_eq!(hll.count(), 0);
for i in 0..1_000u32 {
    hll.insert(&i);
}
let estimate = hll.count();
assert!((960..=1_040).contains(&estimate));
Source

pub fn is_empty(&self) -> bool

Returns true if no items have been inserted.

Source

pub fn precision(&self) -> u8

The configured precision (log2 of the register count).

Source

pub fn clear(&mut self)

Resets the estimator to empty, retaining the allocation.

§Examples
use bloom_lib::HyperLogLog;

let mut hll = HyperLogLog::new(14).unwrap();
hll.insert("x");
hll.clear();
assert!(hll.is_empty());
assert_eq!(hll.count(), 0);
Source

pub fn merge(&mut self, other: &Self) -> Result<(), Error>

Merges other into self by taking the register-wise maximum.

The result estimates the cardinality of the union of the two streams. Both estimators must share the same precision.

§Errors

Returns Error::IncompatibleParameters if the precisions differ.

§Examples
use bloom_lib::HyperLogLog;

let mut a = HyperLogLog::new(14).unwrap();
let mut b = HyperLogLog::new(14).unwrap();
for i in 0..1_000u32 {
    a.insert(&i);
}
for i in 500..1_500u32 {
    b.insert(&i);
}

a.merge(&b).unwrap();
// Union of [0,1000) and [500,1500) is [0,1500): ~1,500 distinct.
let estimate = a.count();
assert!((1_400..=1_600).contains(&estimate));

Trait Implementations§

Source§

impl<T: Clone + ?Sized, S: Clone> Clone for HyperLogLog<T, S>

Source§

fn clone(&self) -> HyperLogLog<T, S>

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<T: Debug + ?Sized, S: Debug> Debug for HyperLogLog<T, S>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de, T: ?Sized, S> Deserialize<'de> for HyperLogLog<T, S>
where S: Default,

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl<T: ?Sized, S> Serialize for HyperLogLog<T, S>

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl<T, S> Freeze for HyperLogLog<T, S>
where S: Freeze, T: ?Sized,

§

impl<T, S> RefUnwindSafe for HyperLogLog<T, S>
where S: RefUnwindSafe, T: ?Sized,

§

impl<T, S> Send for HyperLogLog<T, S>
where S: Send, T: ?Sized,

§

impl<T, S> Sync for HyperLogLog<T, S>
where S: Sync, T: ?Sized,

§

impl<T, S> Unpin for HyperLogLog<T, S>
where S: Unpin, T: ?Sized,

§

impl<T, S> UnsafeUnpin for HyperLogLog<T, S>
where S: UnsafeUnpin, T: ?Sized,

§

impl<T, S> UnwindSafe for HyperLogLog<T, S>
where S: UnwindSafe, T: ?Sized,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,