Skip to main content

TopK

Struct TopK 

Source
pub struct TopK<T, S = DefaultHashBuilder> { /* private fields */ }
Available on crate feature alloc only.
Expand description

Tracks the k most frequent items in a stream using bounded memory.

A Top-K tracker combines a CountMinSketch — which estimates how often any item has been seen — with a small list of the k highest-frequency items observed so far. Every insertion updates the sketch and, when the item’s estimated count is high enough, promotes it into the monitored list, evicting the current minimum. Memory is bounded by the sketch size plus k stored keys, regardless of how many distinct items flow through.

The estimate of which items are in the top set is approximate: under heavy churn a true heavy hitter can be missed if it never accumulates a high enough estimate while resident, but frequent, stable items are reported reliably.

Unlike the other structures, TopK stores the keys themselves, so the item type must be Eq + Hash and sized. A key is moved into the monitored list only when it is promoted, so non-promoted insertions store nothing.

§Examples

use bloom_lib::TopK;

let mut top = TopK::new(3, 0.001, 0.001).unwrap();

for _ in 0..100 { top.insert("apple"); }
for _ in 0..50  { top.insert("banana"); }
for _ in 0..10  { top.insert("cherry"); }
for _ in 0..1   { top.insert("date"); }

let ranked = top.top();
assert_eq!(ranked[0].0, &"apple");
assert_eq!(ranked[1].0, &"banana");
assert_eq!(ranked.len(), 3); // only k items are kept

Implementations§

Source§

impl<T> TopK<T, DefaultHashBuilder>
where T: Hash + Eq,

Source

pub fn new(k: usize, epsilon: f64, delta: f64) -> Result<Self, Error>

Creates a tracker for the top k items, with an underlying sketch sized for error epsilon at confidence 1 - delta, using the default hasher.

§Parameters
  • k: how many top items to keep. Must be non-zero.
  • epsilon: the sketch error factor, in (0.0, 1.0). Smaller is more accurate and uses more memory.
  • delta: the sketch failure probability, in (0.0, 1.0).
§Errors

Returns Error::InvalidParameter if k is zero, or if epsilon or delta is not a finite value in (0.0, 1.0).

§Examples
use bloom_lib::TopK;

let top = TopK::<&str>::new(10, 0.001, 0.001).unwrap();
assert_eq!(top.k(), 10);
assert!(top.is_empty());
Source§

impl<T, S> TopK<T, S>
where T: Hash + Eq, S: BuildHasher,

Source

pub fn with_hasher( k: usize, epsilon: f64, delta: f64, hasher: S, ) -> Result<Self, Error>

Creates a tracker with a caller-supplied hasher.

§Errors

Returns Error::InvalidParameter if k is zero, or if epsilon or delta is not a finite value in (0.0, 1.0).

§Examples
use std::collections::hash_map::RandomState;
use bloom_lib::TopK;

let top: TopK<&str, RandomState> =
    TopK::with_hasher(10, 0.001, 0.001, RandomState::new()).unwrap();
Source

pub fn insert(&mut self, item: T)

Records one occurrence of item, updating the sketch and the top set.

The key is moved into the monitored list only if it is promoted into the top k; a non-promoted insertion stores nothing beyond the sketch update.

§Examples
use bloom_lib::TopK;

let mut top = TopK::new(2, 0.01, 0.01).unwrap();
top.insert("x");
top.insert("x");
assert_eq!(top.estimate(&"x"), 2);
Source

pub fn estimate(&self, item: &T) -> u64

Returns the estimated frequency of item from the underlying sketch.

This works for any item, whether or not it is in the top set.

§Examples
use bloom_lib::TopK;

let mut top = TopK::new(5, 0.001, 0.001).unwrap();
for _ in 0..7 { top.insert("seven"); }
assert!(top.estimate(&"seven") >= 7);
assert_eq!(top.estimate(&"absent"), 0);
Source

pub fn top(&self) -> Vec<(&T, u64)>

Returns the monitored items paired with their estimated counts, ordered from most to least frequent.

The returned vector borrows the keys, so no key cloning occurs. Its length is at most k.

§Examples
use bloom_lib::TopK;

let mut top = TopK::new(2, 0.01, 0.01).unwrap();
for _ in 0..5 { top.insert("a"); }
for _ in 0..3 { top.insert("b"); }

let ranked = top.top();
assert_eq!(ranked[0].0, &"a");
assert_eq!(ranked[1].0, &"b");
Source

pub fn k(&self) -> usize

The configured number of top items to track.

Source

pub fn len(&self) -> usize

The number of items currently monitored (at most k).

Source

pub fn is_empty(&self) -> bool

Returns true if no items are monitored yet.

Source

pub fn clear(&mut self)

Clears the sketch and the monitored set.

§Examples
use bloom_lib::TopK;

let mut top = TopK::new(3, 0.01, 0.01).unwrap();
top.insert("x");
top.clear();
assert!(top.is_empty());
assert_eq!(top.estimate(&"x"), 0);

Trait Implementations§

Source§

impl<T: Clone, S: Clone> Clone for TopK<T, S>

Source§

fn clone(&self) -> TopK<T, S>

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<T: Debug, S: Debug> Debug for TopK<T, S>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de, T, S> Deserialize<'de> for TopK<T, S>
where CountMinSketch<T, S>: Deserialize<'de>, T: Deserialize<'de>,

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl<T, S> Serialize for TopK<T, S>

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl<T, S> Freeze for TopK<T, S>
where S: Freeze,

§

impl<T, S> RefUnwindSafe for TopK<T, S>

§

impl<T, S> Send for TopK<T, S>
where S: Send, T: Send,

§

impl<T, S> Sync for TopK<T, S>
where S: Sync, T: Sync,

§

impl<T, S> Unpin for TopK<T, S>
where S: Unpin, T: Unpin,

§

impl<T, S> UnsafeUnpin for TopK<T, S>
where S: UnsafeUnpin,

§

impl<T, S> UnwindSafe for TopK<T, S>
where S: UnwindSafe, T: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,