Struct ofilter::Params

source ·
pub struct Params {
    pub nb_hash: usize,
    pub bit_len: usize,
    pub nb_items: usize,
    pub fp_rate: f64,
    pub predict: bool,
}
Expand description

Parameters used to build bloom filters.

This is a utility to set up various bloom filter parameters. There are typically 4 parameters:

  • n (nb_items): number of items in the filter
  • p (fp_rate): probability of false positives, fraction between 0 and 1
  • m (bit_len): number of bits in the filter
  • k (nb_hash): number of hash functions

All of them are linked, if one changes the size of the filter, obviously, it changes the number of items it can hold.

A few things to keep in mind:

  • for a pure Bloom filter, the only technical parameters that matter are nb_hash and bit_len. They fully define the filter.
  • from a functional point of view, most of the time you are interested in a given nb_items and fp_rate. From there the 2 others are derived.

You may partially fill this struct, anything which contains zero will be inferred either by deducing it from other defined params, or by replacing it with a default value. This is done by calling .adjust().

Also, there is a predict field which can be used to turn the filter into a predicatable filter. This is convenient for testing, and may be of interest in edge cases but most of the time a real random version is prefered. It avoids bias and also can protect against some DDOS attacks based on hash collision.

A few helpers are defined as well, for common use-cases.

§A bit of theory

This interactive Bloom filter params calulator proved very useful while developping this. Also it recaps the usual formulas, with:

  • n -> number of items in the filter
  • p -> probability of false positives, fraction between 0 and 1
  • m -> number of bits in the filter
  • k -> number of hash functions

We have:

  • n = ceil(m / (-k / log(1 - exp(log(p) / k))))
  • p = pow(1 - exp(-k / (m / n)), k)
  • m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
  • k = round((m / n) * log(2))

A command line equivalent would be this practical bloom-filter-calculator.rb gist

Note that there are corner cases, for example, the formula that gives bit_len (m) from nb_items (n) and fp_rate (p), corresponds to the optimal case, when nb_hash (k) has been chosen optimally. If not, it has to be revisited and adapted to the real value of nb_hash (k). In practice, unless you impose it, what this package does is enforce a nb_hash (k) of 2, which is generally optimal if you consider CPU time and not memory usage.

§Examples

Getting a filter for a given number of items, everything else default:

use ofilter::Params;

let params = Params::with_nb_items(10_000);

assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", &params));

Getting a filter for a given number of items, false positive rate, and enforcing number of hash:

use ofilter::Params;

let params = Params{
    nb_hash: 3,
    bit_len: 0,
    nb_items: 100_000,
    fp_rate: 0.1,
    predict: false,
}.adjust();

assert_eq!("{ nb_hash: 3, bit_len: 480833, nb_items: 100000, fp_rate: 0.100000, predict: false }", format!("{}", &params));

Fields§

§nb_hash: usize

Number of hash functions used by the filter.

Also referred to as k is most Bloom filter papers.

§bit_len: usize

Length of the bit vector used by the filter.

Also referred to as m is most Bloom filter papers.

§nb_items: usize

Number of items the Bloom filter is designed for.

Also referred to as n is most Bloom filter papers.

§fp_rate: f64

False positive rate of the Bloom filter.

Also referred to as p is most Bloom filter papers.

§predict: bool

If set to true, Bloom filter is predictable.

Use this for testing, when you want something that is 100% predictable and avoid flaky behavior. In production, it would be safer to rely on the random, statistical default behavior.

One reason is security, among other examples, using a predictable hash for a cache may expose you to some sort of DDOS attack.

If in doubt, leave it to false.

Implementations§

source§

impl Params

source

pub fn with_nb_items(nb_items: usize) -> Params

Parameters to store a given number of items.

All other values are using defaults, or are derive from this. This is the go-to function to create a reasonable filter that holds N items.

§Examples:
use ofilter::Params;

let p = Params::with_nb_items(10_000);
assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", p));
source

pub fn with_nb_items_and_fp_rate(nb_items: usize, fp_rate: f64) -> Params

Parameters to store a given number of items, enforcing false positive rate.

By defining nb_items and fp_rate, the filter is almost completely defined. The other missing parameter is the number of hash functions. There is an optimal number for this, but this function will enforce a default of 2, which most of the time yields excellent results, and avoids calculating too many hashes.

If you want to use the optimal number of hashes, create the Params struct with nb_items, fp_rate and nb_hash defined, then call .adjust().

§Examples:
use ofilter::Params;

let p = Params::with_nb_items_and_fp_rate(10_000, 0.001);
assert_eq!("{ nb_hash: 2, bit_len: 622402, nb_items: 10000, fp_rate: 0.001000, predict: false }", format!("{}", p));
source

pub fn with_bit_len(bit_len: usize) -> Params

Parameters to guess the number of items from bit buffer size.

If the only thing you care about, this will give you the best result for that amount of memory.

§Examples:
use ofilter::Params;

let p = Params::with_bit_len(8_192);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 432, fp_rate: 0.010019, predict: false }", format!("{}", p));
source

pub fn with_bit_len_and_nb_items(bit_len: usize, nb_items: usize) -> Params

Parameters to store a given number of items, enforcing bit buffer size

If the only thing you care about, this will give you the best result for that amount of memory.

§Examples:
use ofilter::Params;

let p = Params::with_bit_len_and_nb_items(8_192, 1_000);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 1000, fp_rate: 0.046925, predict: false }", format!("{}", p));
source

pub fn estimate_nb_items(nb_hash: usize, bit_len: usize, fp_rate: f64) -> usize

Estimate the number of items one can store.

This applies the formula n = ceil(m / (-k / log(1 - exp(log(p) / k)))).

§Examples:
use ofilter::Params;

assert_eq!(1994, Params::estimate_nb_items(5, 10_000, 0.1));
source

pub fn estimate_fp_rate(nb_hash: usize, bit_len: usize, nb_items: usize) -> f64

Estimate the false positive rate.

This applies the formula p = powf(1 - exp(-k / (m / n)), k)

§Examples:
use ofilter::Params;

assert_eq!("0.009431", format!("{:0.6}", Params::estimate_fp_rate(5, 10_000, 1_000)));
source

pub fn optimal_bit_len(nb_items: usize, fp_rate: f64) -> usize

Calculate the optimal number of bits.

This applies the formula m = ceil((n * log(p)) / log(1 / pow(2, log(2))))

Note that this is the optimal number of bits, but for the case where the number of hash has also been chosen optimally. You’ll notice the formula does not depend on fp_rate (p).

See guess_bit_len_for_fp_rate if you want to also take fp_rate in account.

§Examples:
use ofilter::Params;

assert_eq!(62353, Params::optimal_bit_len(10_000, 0.05));
source

pub fn optimal_nb_hash(bit_len: usize, nb_items: usize) -> usize

Calculate the optimal number of hash.

This applies the formula k = round((m / n) * log(2))

Note that this is the optimal number of hash, but it does not account for false positive rate, which you may want to impose. You’ll notice the formula does not depend on fp_rate (p).

See guess_bit_len_for_fp_rate if you want to also take fp_rate in account.

§Examples:
use ofilter::Params;

assert_eq!(13, Params::optimal_nb_hash(10_000, 500));
source

pub fn guess_bit_len_for_fp_rate( nb_hash: usize, nb_items: usize, fp_rate: f64 ) -> usize

Guess the number of bits for a given false positive rate.

There is a formula m = ceil((n * log(p)) / log(1 / pow(2, log(2)))) to get this result but it gives it for the optimal case. Sometimes you can prefer to, typically, lower the number of bits in the bits buffer to get something smaller in memory, and just give up a bit on the optimal fp_rate (p).

This function does a bisect search to find the correct value.

§Examples:
use ofilter::Params;

assert_eq!(480833, Params::guess_bit_len_for_fp_rate(3, 100_000, 0.1));
source

pub fn guess_nb_hash_for_fp_rate( bit_len: usize, nb_items: usize, fp_rate: f64 ) -> usize

Guess the number of hash for a given false positive rate.

There is a formula k = round((m / n) * log(2)) to get this result but it gives it for the optimal case. Sometimes you can prefer to, typically, lower the number of hash functions to get something faster, and just give up a bit on the optimal fp_rate (p).

This function starts with the optimal nb_hash as a seed, and then tries to lower the it as long as the false positive rate gets closer to the target fp_rate.

§Examples:
use ofilter::Params;

assert_eq!(6, Params::guess_nb_hash_for_fp_rate(100_000, 10_000, 0.0005));
source

pub fn adjust(self) -> Self

Adjust params so that they are consistent.

The 4 parameters in a Bloom filter are closely linked. More precisely, with:

  • n (nb_items) -> number of items in the filter
  • p (fp_rate) -> probability of false positives, fraction between 0 and 1
  • m (bit_len) -> number of bits in the filter
  • k (nb_hash) -> number of hash functions

We have:

  • n = ceil(m / (-k / log(1 - exp(log(p) / k))))
  • p = pow(1 - exp(-k / (m / n)), k)
  • m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
  • k = round((m / n) * log(2))

This function works as follows:

  • if any parameter is non-zero, tries to respect it
  • if any parameter is zero, and can be inferred from others, calculate it
  • for all other parameters, use defaults

Once everything has a value, it will ultimately re-calculate the fp_rate so that it matches the other 3, which are integers, so may not be changed linearily.

In practice, use it to feed the new_with_params constructors for filters.

§Examples:
use ofilter::Params;

let params = Params{
    nb_hash: 2,
    bit_len: 0,
    nb_items: 1_000,
    fp_rate: 0.1,
    predict: false,
};

assert_eq!("{ nb_hash: 2, bit_len: 0, nb_items: 1000, fp_rate: 0.100000, predict: false }",
    format!("{}", params)
);
assert_eq!("{ nb_hash: 2, bit_len: 5262, nb_items: 1000, fp_rate: 0.099980, predict: false }",
    format!("{}", params.adjust())
);

Trait Implementations§

source§

impl Clone for Params

source§

fn clone(&self) -> Params

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Params

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for Params

source§

fn default() -> Self

Returns the “default value” for a type. Read more
source§

impl<'de> Deserialize<'de> for Params

source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl Display for Params

Pretty-print parameters

§Examples

use ofilter::Params;

let p = Params::with_nb_items(100);
assert_eq!("{ nb_hash: 2, bit_len: 1899, nb_items: 100, fp_rate: 0.009992, predict: false }", format!("{}", p));
source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for Params

Compare two parameters sets.

The false positive rate is rounded to 6 digits.

§Examples

use ofilter::Params;

let p1 = Params::with_nb_items(100);
let mut p2 = Params::with_nb_items(100);
p2.fp_rate += 1e-8;
assert_eq!(p1, p2);
source§

fn eq(&self, other: &Self) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Serialize for Params

source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
source§

impl Eq for Params

Auto Trait Implementations§

§

impl Freeze for Params

§

impl RefUnwindSafe for Params

§

impl Send for Params

§

impl Sync for Params

§

impl Unpin for Params

§

impl UnwindSafe for Params

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> Conv for T

source§

fn conv<T>(self) -> T
where Self: Into<T>,

Converts self into T using Into<T>. Read more
source§

impl<T> FmtForward for T

source§

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

Causes self to use its Binary implementation when Debug-formatted.
source§

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

Causes self to use its Display implementation when Debug-formatted.
source§

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

Causes self to use its LowerExp implementation when Debug-formatted.
source§

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

Causes self to use its LowerHex implementation when Debug-formatted.
source§

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

Causes self to use its Octal implementation when Debug-formatted.
source§

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

Causes self to use its Pointer implementation when Debug-formatted.
source§

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

Causes self to use its UpperExp implementation when Debug-formatted.
source§

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

Causes self to use its UpperHex implementation when Debug-formatted.
source§

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

Formats each item in a sequence. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Pipe for T
where T: ?Sized,

source§

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

Pipes by value. This is generally the method you want to use. Read more
source§

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

Borrows self and passes that borrow into the pipe function. Read more
source§

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

Mutably borrows self and passes that borrow into the pipe function. Read more
source§

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

Borrows self, then passes self.borrow() into the pipe function. Read more
source§

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.borrow_mut() into the pipe function. Read more
source§

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

Borrows self, then passes self.as_ref() into the pipe function.
source§

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.as_mut() into the pipe function.
source§

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

Borrows self, then passes self.deref() into the pipe function.
source§

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.deref_mut() into the pipe function.
source§

impl<T> Tap for T

source§

fn tap(self, func: impl FnOnce(&Self)) -> Self

Immutable access to a value. Read more
source§

fn tap_mut(self, func: impl FnOnce(&mut Self)) -> Self

Mutable access to a value. Read more
source§

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Immutable access to the Borrow<B> of a value. Read more
source§

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Mutable access to the BorrowMut<B> of a value. Read more
source§

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Immutable access to the AsRef<R> view of a value. Read more
source§

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Mutable access to the AsMut<R> view of a value. Read more
source§

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Immutable access to the Deref::Target of a value. Read more
source§

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Mutable access to the Deref::Target of a value. Read more
source§

fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self

Calls .tap() only in debug builds, and is erased in release builds.
source§

fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self

Calls .tap_mut() only in debug builds, and is erased in release builds.
source§

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Calls .tap_borrow() only in debug builds, and is erased in release builds.
source§

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Calls .tap_borrow_mut() only in debug builds, and is erased in release builds.
source§

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Calls .tap_ref() only in debug builds, and is erased in release builds.
source§

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Calls .tap_ref_mut() only in debug builds, and is erased in release builds.
source§

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Calls .tap_deref() only in debug builds, and is erased in release builds.
source§

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Calls .tap_deref_mut() only in debug builds, and is erased in release builds.
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T> ToString for T
where T: Display + ?Sized,

source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
source§

impl<T> TryConv for T

source§

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

Attempts to convert self into T using TryInto<T>. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,