pub struct Params {
pub nb_hash: usize,
pub bit_len: usize,
pub nb_items: usize,
pub fp_rate: f64,
pub predict: bool,
}Expand description
Parameters used to build bloom filters.
This is a utility to set up various bloom filter parameters. There are typically 4 parameters:
- n (
nb_items): number of items in the filter - p (
fp_rate): probability of false positives, fraction between 0 and 1 - m (
bit_len): number of bits in the filter - k (
nb_hash): number of hash functions
All of them are linked, if one changes the size of the filter, obviously, it changes the number of items it can hold.
A few things to keep in mind:
- for a pure Bloom filter, the only technical parameters that matter
are
nb_hashandbit_len. They fully define the filter. - from a functional point of view, most of the time you are interested
in a given
nb_itemsandfp_rate. From there the 2 others are derived.
You may partially fill this struct, anything which contains zero will
be inferred either by deducing it from other defined params, or by
replacing it with a default value. This is done by calling .adjust().
Also, there is a predict field which can be used to turn the
filter into a predicatable filter. This is convenient for testing,
and may be of interest in edge cases but most of the time a real random
version is prefered. It avoids bias and also can protect against
some DDOS attacks based on hash collision.
A few helpers are defined as well, for common use-cases.
§A bit of theory
This interactive Bloom filter params calulator proved very useful while developping this. Also it recaps the usual formulas, with:
- n -> number of items in the filter
- p -> probability of false positives, fraction between 0 and 1
- m -> number of bits in the filter
- k -> number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))p = pow(1 - exp(-k / (m / n)), k)m = ceil((n * log(p)) / log(1 / pow(2, log(2))))k = round((m / n) * log(2))
A command line equivalent would be this practical bloom-filter-calculator.rb gist
Note that there are corner cases, for example, the formula that gives
bit_len (m) from nb_items (n) and fp_rate (p), corresponds to
the optimal case, when nb_hash (k) has been chosen optimally. If not,
it has to be revisited and adapted to the real value of nb_hash (k).
In practice, unless you impose it, what this package does is enforce
a nb_hash (k) of 2, which is generally optimal if you consider CPU time
and not memory usage.
§Links
- Bloom Filter on Wikipedia
- All About Bloom Filters
- Buffered Bloom Filters on Solid State Storage
- Age-Partitioned Bloom Filters
- countBF: A General-purpose High Accuracy and Space Efficient Counting Bloom Filter
- Stable Learned Bloom Filters for Data Streams
- Building a Better Bloom Filter
§Examples
Getting a filter for a given number of items, everything else default:
use ofilter::Params;
let params = Params::with_nb_items(10_000);
assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", ¶ms));Getting a filter for a given number of items, false positive rate, and enforcing number of hash:
use ofilter::Params;
let params = Params{
nb_hash: 3,
bit_len: 0,
nb_items: 100_000,
fp_rate: 0.1,
predict: false,
}.adjust();
assert_eq!("{ nb_hash: 3, bit_len: 480833, nb_items: 100000, fp_rate: 0.100000, predict: false }", format!("{}", ¶ms));Fields§
§nb_hash: usizeNumber of hash functions used by the filter.
Also referred to as k is most Bloom filter papers.
bit_len: usizeLength of the bit vector used by the filter.
Also referred to as m is most Bloom filter papers.
nb_items: usizeNumber of items the Bloom filter is designed for.
Also referred to as n is most Bloom filter papers.
fp_rate: f64False positive rate of the Bloom filter.
Also referred to as p is most Bloom filter papers.
predict: boolIf set to true, Bloom filter is predictable.
Use this for testing, when you want something that is 100% predictable and avoid flaky behavior. In production, it would be safer to rely on the random, statistical default behavior.
One reason is security, among other examples, using a predictable hash for a cache may expose you to some sort of DDOS attack.
If in doubt, leave it to false.
Implementations§
Source§impl Params
impl Params
Sourcepub fn with_nb_items(nb_items: usize) -> Params
pub fn with_nb_items(nb_items: usize) -> Params
Parameters to store a given number of items.
All other values are using defaults, or are derive from this. This is the go-to function to create a reasonable filter that holds N items.
§Examples:
use ofilter::Params;
let p = Params::with_nb_items(10_000);
assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", p));Sourcepub fn with_nb_items_and_fp_rate(nb_items: usize, fp_rate: f64) -> Params
pub fn with_nb_items_and_fp_rate(nb_items: usize, fp_rate: f64) -> Params
Parameters to store a given number of items, enforcing false positive rate.
By defining nb_items and fp_rate, the filter is almost completely defined.
The other missing parameter is the number of hash functions. There is an
optimal number for this, but this function will enforce a default of 2,
which most of the time yields excellent results, and avoids calculating
too many hashes.
If you want to use the optimal number of hashes, create the Params struct
with nb_items, fp_rate and nb_hash defined, then call .adjust().
§Examples:
use ofilter::Params;
let p = Params::with_nb_items_and_fp_rate(10_000, 0.001);
assert_eq!("{ nb_hash: 2, bit_len: 622402, nb_items: 10000, fp_rate: 0.001000, predict: false }", format!("{}", p));Sourcepub fn with_bit_len(bit_len: usize) -> Params
pub fn with_bit_len(bit_len: usize) -> Params
Parameters to guess the number of items from bit buffer size.
If the only thing you care about, this will give you the best result for that amount of memory.
§Examples:
use ofilter::Params;
let p = Params::with_bit_len(8_192);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 432, fp_rate: 0.010019, predict: false }", format!("{}", p));Sourcepub fn with_bit_len_and_nb_items(bit_len: usize, nb_items: usize) -> Params
pub fn with_bit_len_and_nb_items(bit_len: usize, nb_items: usize) -> Params
Parameters to store a given number of items, enforcing bit buffer size
If the only thing you care about, this will give you the best result for that amount of memory.
§Examples:
use ofilter::Params;
let p = Params::with_bit_len_and_nb_items(8_192, 1_000);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 1000, fp_rate: 0.046925, predict: false }", format!("{}", p));Sourcepub fn estimate_nb_items(nb_hash: usize, bit_len: usize, fp_rate: f64) -> usize
pub fn estimate_nb_items(nb_hash: usize, bit_len: usize, fp_rate: f64) -> usize
Estimate the number of items one can store.
This applies the formula n = ceil(m / (-k / log(1 - exp(log(p) / k)))).
§Examples:
use ofilter::Params;
assert_eq!(1994, Params::estimate_nb_items(5, 10_000, 0.1));Sourcepub fn estimate_fp_rate(nb_hash: usize, bit_len: usize, nb_items: usize) -> f64
pub fn estimate_fp_rate(nb_hash: usize, bit_len: usize, nb_items: usize) -> f64
Estimate the false positive rate.
This applies the formula p = powf(1 - exp(-k / (m / n)), k)
§Examples:
use ofilter::Params;
assert_eq!("0.009431", format!("{:0.6}", Params::estimate_fp_rate(5, 10_000, 1_000)));Sourcepub fn optimal_bit_len(nb_items: usize, fp_rate: f64) -> usize
pub fn optimal_bit_len(nb_items: usize, fp_rate: f64) -> usize
Calculate the optimal number of bits.
This applies the formula m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
Note that this is the optimal number of bits, but for the case
where the number of hash has also been chosen optimally. You’ll notice
the formula does not depend on fp_rate (p).
See guess_bit_len_for_fp_rate if you want to also take fp_rate in account.
§Examples:
use ofilter::Params;
assert_eq!(62353, Params::optimal_bit_len(10_000, 0.05));Sourcepub fn optimal_nb_hash(bit_len: usize, nb_items: usize) -> usize
pub fn optimal_nb_hash(bit_len: usize, nb_items: usize) -> usize
Calculate the optimal number of hash.
This applies the formula k = round((m / n) * log(2))
Note that this is the optimal number of hash, but it does not
account for false positive rate, which you may want to impose.
You’ll notice the formula does not depend on fp_rate (p).
See guess_bit_len_for_fp_rate if you want to also take fp_rate in account.
§Examples:
use ofilter::Params;
assert_eq!(13, Params::optimal_nb_hash(10_000, 500));Sourcepub fn guess_bit_len_for_fp_rate(
nb_hash: usize,
nb_items: usize,
fp_rate: f64,
) -> usize
pub fn guess_bit_len_for_fp_rate( nb_hash: usize, nb_items: usize, fp_rate: f64, ) -> usize
Guess the number of bits for a given false positive rate.
There is a formula m = ceil((n * log(p)) / log(1 / pow(2, log(2)))) to get this
result but it gives it for the optimal case. Sometimes you can
prefer to, typically, lower the number of bits in the bits buffer to
get something smaller in memory, and just give up a bit on the optimal
fp_rate (p).
This function does a bisect search to find the correct value.
§Examples:
use ofilter::Params;
assert_eq!(480833, Params::guess_bit_len_for_fp_rate(3, 100_000, 0.1));Sourcepub fn guess_nb_hash_for_fp_rate(
bit_len: usize,
nb_items: usize,
fp_rate: f64,
) -> usize
pub fn guess_nb_hash_for_fp_rate( bit_len: usize, nb_items: usize, fp_rate: f64, ) -> usize
Guess the number of hash for a given false positive rate.
There is a formula k = round((m / n) * log(2)) to get this
result but it gives it for the optimal case. Sometimes you can
prefer to, typically, lower the number of hash functions to
get something faster, and just give up a bit on the optimal
fp_rate (p).
This function starts with the optimal nb_hash as a seed,
and then tries to lower the it as long as the false positive rate
gets closer to the target fp_rate.
§Examples:
use ofilter::Params;
assert_eq!(6, Params::guess_nb_hash_for_fp_rate(100_000, 10_000, 0.0005));Sourcepub fn adjust(self) -> Self
pub fn adjust(self) -> Self
Adjust params so that they are consistent.
The 4 parameters in a Bloom filter are closely linked. More precisely, with:
- n (
nb_items) -> number of items in the filter - p (
fp_rate) -> probability of false positives, fraction between 0 and 1 - m (
bit_len) -> number of bits in the filter - k (
nb_hash) -> number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))p = pow(1 - exp(-k / (m / n)), k)m = ceil((n * log(p)) / log(1 / pow(2, log(2))))k = round((m / n) * log(2))
This function works as follows:
- if any parameter is non-zero, tries to respect it
- if any parameter is zero, and can be inferred from others, calculate it
- for all other parameters, use defaults
Once everything has a value, it will ultimately re-calculate the fp_rate
so that it matches the other 3, which are integers, so may not be changed
linearily.
In practice, use it to feed the new_with_params constructors for filters.
§Examples:
use ofilter::Params;
let params = Params{
nb_hash: 2,
bit_len: 0,
nb_items: 1_000,
fp_rate: 0.1,
predict: false,
};
assert_eq!("{ nb_hash: 2, bit_len: 0, nb_items: 1000, fp_rate: 0.100000, predict: false }",
format!("{}", params)
);
assert_eq!("{ nb_hash: 2, bit_len: 5262, nb_items: 1000, fp_rate: 0.099980, predict: false }",
format!("{}", params.adjust())
);Trait Implementations§
Source§impl<'de> Deserialize<'de> for Params
impl<'de> Deserialize<'de> for Params
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl Display for Params
Pretty-print parameters
impl Display for Params
Pretty-print parameters
§Examples
use ofilter::Params;
let p = Params::with_nb_items(100);
assert_eq!("{ nb_hash: 2, bit_len: 1899, nb_items: 100, fp_rate: 0.009992, predict: false }", format!("{}", p));Source§impl PartialEq for Params
Compare two parameters sets.
impl PartialEq for Params
Compare two parameters sets.
The false positive rate is rounded to 6 digits.
§Examples
use ofilter::Params;
let p1 = Params::with_nb_items(100);
let mut p2 = Params::with_nb_items(100);
p2.fp_rate += 1e-8;
assert_eq!(p1, p2);impl Eq for Params
Auto Trait Implementations§
impl Freeze for Params
impl RefUnwindSafe for Params
impl Send for Params
impl Sync for Params
impl Unpin for Params
impl UnwindSafe for Params
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> FmtForward for T
impl<T> FmtForward for T
Source§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self to use its Binary implementation when Debug-formatted.Source§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self to use its Display implementation when
Debug-formatted.Source§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self to use its LowerExp implementation when
Debug-formatted.Source§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self to use its LowerHex implementation when
Debug-formatted.Source§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self to use its Octal implementation when Debug-formatted.Source§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self to use its Pointer implementation when
Debug-formatted.Source§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self to use its UpperExp implementation when
Debug-formatted.Source§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self to use its UpperHex implementation when
Debug-formatted.Source§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
Source§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
Source§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
Source§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R,
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
Source§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self, then passes self.as_ref() into the pipe function.Source§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self, then passes self.as_mut() into the pipe
function.Source§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self, then passes self.deref() into the pipe function.Source§impl<T> Tap for T
impl<T> Tap for T
Source§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B> of a value. Read moreSource§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B> of a value. Read moreSource§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R> view of a value. Read moreSource§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R> view of a value. Read moreSource§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap() only in debug builds, and is erased in release builds.Source§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut() only in debug builds, and is erased in release
builds.Source§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref() only in debug builds, and is erased in release
builds.Source§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut() only in debug builds, and is erased in release
builds.Source§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref() only in debug builds, and is erased in release
builds.