Skip to main content

CountMinSketch

Struct CountMinSketch 

Source
pub struct CountMinSketch<T: ?Sized, S = DefaultHashBuilder> { /* private fields */ }
Available on crate feature alloc only.
Expand description

A sublinear-space frequency estimator.

A Count-Min Sketch counts how many times it has seen each item using a fixed grid of counters, far smaller than a real HashMap<T, u64> would be. Each item is hashed into one counter per row and those counters are incremented; the estimate for an item is the minimum across its rows. The estimate never undercounts and overcounts by a bounded amount with high probability — the width controls the error magnitude epsilon, the depth controls the confidence delta.

The sketch is generic over the item type T and a BuildHasher S, defaulting to the deterministic DefaultHashBuilder.

§Examples

use bloom_lib::CountMinSketch;

// ~0.1% error with 99.9% confidence.
let mut sketch = CountMinSketch::new(0.001, 0.001).unwrap();

sketch.increment("apple");
sketch.add("apple", 4);
sketch.increment("banana");

assert!(sketch.estimate("apple") >= 5); // never undercounts
assert_eq!(sketch.estimate("cherry"), 0);

Implementations§

Source§

impl<T: ?Sized> CountMinSketch<T, DefaultHashBuilder>

Source

pub fn new(epsilon: f64, delta: f64) -> Result<Self, Error>

Creates a sketch sized for an error factor epsilon at confidence 1 - delta, using the default hasher.

The estimate for any item never undercounts and, with probability at least 1 - delta, overcounts by at most epsilon * N, where N is the total of all counts added. Smaller epsilon widens the grid; smaller delta deepens it.

§Parameters
  • epsilon: the relative error factor. Must be in (0.0, 1.0). The width is ceil(e / epsilon).
  • delta: the failure probability. Must be in (0.0, 1.0). The depth is ceil(ln(1 / delta)).
§Errors

Returns Error::InvalidParameter if either argument is not a finite value in the open interval (0.0, 1.0).

§Examples
use bloom_lib::CountMinSketch;

let sketch = CountMinSketch::<&str>::new(0.01, 0.01).unwrap();
assert!(sketch.width() >= 271);
assert!(sketch.depth() >= 4);
Source

pub fn with_dimensions(width: usize, depth: usize) -> Result<Self, Error>

Creates a sketch with an explicit width and depth, using the default hasher.

§Parameters
  • width: counters per row. Must be non-zero.
  • depth: number of rows (independent hashes). Must be non-zero.
§Errors

Returns Error::InvalidParameter if either argument is zero.

§Examples
use bloom_lib::CountMinSketch;

let sketch = CountMinSketch::<u32>::with_dimensions(2_048, 5).unwrap();
assert_eq!(sketch.width(), 2_048);
assert_eq!(sketch.depth(), 5);
Source§

impl<T: ?Sized, S: BuildHasher> CountMinSketch<T, S>

Source

pub fn with_hasher(epsilon: f64, delta: f64, hasher: S) -> Result<Self, Error>

Creates a sketch from epsilon/delta with a caller-supplied hasher.

§Errors

Returns Error::InvalidParameter if either argument is not a finite value in (0.0, 1.0).

§Examples
use std::collections::hash_map::RandomState;
use bloom_lib::CountMinSketch;

let sketch: CountMinSketch<&str, RandomState> =
    CountMinSketch::with_hasher(0.01, 0.01, RandomState::new()).unwrap();
Source

pub fn with_dimensions_and_hasher( width: usize, depth: usize, hasher: S, ) -> Result<Self, Error>

Creates a sketch with an explicit geometry and a caller-supplied hasher.

§Errors

Returns Error::InvalidParameter if width or depth is zero.

Source

pub fn add(&mut self, item: &T, count: u64)
where T: Hash,

Records count additional occurrences of item.

Counters saturate at u64::MAX rather than overflowing, so an adversarial or runaway stream degrades gracefully instead of panicking or wrapping.

§Examples
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.01, 0.01).unwrap();
sketch.add("page-view", 250);
assert!(sketch.estimate("page-view") >= 250);
Source

pub fn increment(&mut self, item: &T)
where T: Hash,

Records a single occurrence of item. Shorthand for add(item, 1).

§Examples
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.01, 0.01).unwrap();
sketch.increment("hit");
sketch.increment("hit");
assert!(sketch.estimate("hit") >= 2);
Source

pub fn estimate(&self, item: &T) -> u64
where T: Hash,

Estimates the number of times item has been added.

The estimate is the minimum counter across all rows. It never undercounts the true total and overcounts only by the sketch’s error bound.

§Examples
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.001, 0.001).unwrap();
for _ in 0..100 {
    sketch.increment("frequent");
}
let estimate = sketch.estimate("frequent");
assert!((100..=110).contains(&estimate));
Source

pub fn total_count(&self) -> u64

The sum of every count added (saturating).

Unlike per-item estimates, this is exact up to saturation, because every add contributes to it directly.

§Examples
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.01, 0.01).unwrap();
sketch.add("a", 3);
sketch.add("b", 7);
assert_eq!(sketch.total_count(), 10);
Source

pub fn width(&self) -> usize

The number of counters per row.

Source

pub fn depth(&self) -> usize

The number of rows (independent hash functions).

Source

pub fn clear(&mut self)

Resets every counter to zero, retaining the allocation.

§Examples
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.01, 0.01).unwrap();
sketch.increment("x");
sketch.clear();
assert_eq!(sketch.estimate("x"), 0);
assert_eq!(sketch.total_count(), 0);
Source

pub fn merge(&mut self, other: &Self) -> Result<(), Error>

Merges other into self by summing counters cell by cell (saturating).

After the merge, the sketch estimates frequencies as if every item from both sketches had been added to one. Both sketches must share their geometry.

§Errors

Returns Error::IncompatibleParameters if the two sketches differ in width or depth.

§Examples
use bloom_lib::CountMinSketch;

let mut a = CountMinSketch::with_dimensions(512, 4).unwrap();
let mut b = CountMinSketch::with_dimensions(512, 4).unwrap();
a.add("shared", 2);
b.add("shared", 3);

a.merge(&b).unwrap();
assert!(a.estimate("shared") >= 5);

Trait Implementations§

Source§

impl<T: Clone + ?Sized, S: Clone> Clone for CountMinSketch<T, S>

Source§

fn clone(&self) -> CountMinSketch<T, S>

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<T: Debug + ?Sized, S: Debug> Debug for CountMinSketch<T, S>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de, T: ?Sized, S> Deserialize<'de> for CountMinSketch<T, S>
where S: Default,

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl<T: ?Sized, S> Serialize for CountMinSketch<T, S>

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl<T, S> Freeze for CountMinSketch<T, S>
where S: Freeze, T: ?Sized,

§

impl<T, S> RefUnwindSafe for CountMinSketch<T, S>
where S: RefUnwindSafe, T: ?Sized,

§

impl<T, S> Send for CountMinSketch<T, S>
where S: Send, T: ?Sized,

§

impl<T, S> Sync for CountMinSketch<T, S>
where S: Sync, T: ?Sized,

§

impl<T, S> Unpin for CountMinSketch<T, S>
where S: Unpin, T: ?Sized,

§

impl<T, S> UnsafeUnpin for CountMinSketch<T, S>
where S: UnsafeUnpin, T: ?Sized,

§

impl<T, S> UnwindSafe for CountMinSketch<T, S>
where S: UnwindSafe, T: ?Sized,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,