Struct selectors::bloom::CountingBloomFilter

source ·

pub struct CountingBloomFilter<S>where
    S: BloomStorage,{ /* private fields */ }

Expand description

A counting Bloom filter with parameterized storage to handle counters of different sizes. For now we assume that having two hash functions is enough, but we may revisit that decision later.

The filter uses an array with 2**KeySize entries.

Assuming a well-distributed hash function, a Bloom filter with array size M containing N elements and using k hash function has expected false positive rate exactly

$ (1 - (1 - 1/M)^{kN})^k $

because each array slot has a

$ (1 - 1/M)^{kN} $

chance of being 0, and the expected false positive rate is the probability that all of the k hash functions will hit a nonzero slot.

For reasonable assumptions (M large, kN large, which should both hold if we’re worried about false positives) about M and kN this becomes approximately

$$ (1 - \exp(-kN/M))^k $$

For our special case of k == 2, that’s $(1 - \exp(-2N/M))^2$, or in other words

$$ N/M = -0.5 * \ln(1 - \sqrt(r)) $$

where r is the false positive rate. This can be used to compute the desired KeySize for a given load N and false positive rate r.

If N/M is assumed small, then the false positive rate can further be approximated as 4*N^2/M^2. So increasing KeySize by 1, which doubles M, reduces the false positive rate by about a factor of 4, and a false positive rate of 1% corresponds to about M/N == 20.

What this means in practice is that for a few hundred keys using a KeySize of 12 gives false positive rates on the order of 0.25-4%.

Similarly, using a KeySize of 10 would lead to a 4% false positive rate for N == 100 and to quite bad false positive rates for larger N.

Struct selectors::bloom::CountingBloomFilter

Implementations§

impl<S> CountingBloomFilter<S>where S: BloomStorage,

pub fn new() -> Self

pub fn clear(&mut self)

pub fn is_zeroed(&self) -> bool

pub fn insert_hash(&mut self, hash: u32)

pub fn remove_hash(&mut self, hash: u32)

pub fn might_contain_hash(&self, hash: u32) -> bool

Trait Implementations§

impl<S> Clone for CountingBloomFilter<S>where S: BloomStorage + Clone,

fn clone(&self) -> CountingBloomFilter<S>

fn clone_from(&mut self, source: &Self)

impl<S> Debug for CountingBloomFilter<S>where S: BloomStorage,

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl<S> Default for CountingBloomFilter<S>where S: BloomStorage + Default,

fn default() -> CountingBloomFilter<S>

Auto Trait Implementations§

impl<S> RefUnwindSafe for CountingBloomFilter<S>where S: RefUnwindSafe,

impl<S> Send for CountingBloomFilter<S>where S: Send,

impl<S> Sync for CountingBloomFilter<S>where S: Sync,

impl<S> Unpin for CountingBloomFilter<S>where S: Unpin,

impl<S> UnwindSafe for CountingBloomFilter<S>where S: UnwindSafe,

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>