pub struct Params {
pub nb_hash: usize,
pub bit_len: usize,
pub nb_items: usize,
pub fp_rate: f64,
pub predict: bool,
}
Expand description
Parameters used to build bloom filters.
This is a utility to set up various bloom filter parameters. There are typically 4 parameters:
- n (
nb_items
): number of items in the filter - p (
fp_rate
): probability of false positives, fraction between 0 and 1 - m (
bit_len
): number of bits in the filter - k (
nb_hash
): number of hash functions
All of them are linked, if one changes the size of the filter, obviously, it changes the number of items it can hold.
A few things to keep in mind:
- for a pure Bloom filter, the only technical parameters that matter
are
nb_hash
andbit_len
. They fully define the filter. - from a functional point of view, most of the time you are interested
in a given
nb_items
andfp_rate
. From there the 2 others are derived.
You may partially fill this struct, anything which contains zero
will
be inferred either by deducing it from other defined params, or by
replacing it with a default value. This is done by calling .adjust()
.
Also, there is a predict
field which can be used to turn the
filter into a predicatable filter. This is convenient for testing,
and may be of interest in edge cases but most of the time a real random
version is prefered. It avoids bias and also can protect against
some DDOS attacks based on hash collision.
A few helpers are defined as well, for common use-cases.
§A bit of theory
This interactive Bloom filter params calulator proved very useful while developping this. Also it recaps the usual formulas, with:
- n -> number of items in the filter
- p -> probability of false positives, fraction between 0 and 1
- m -> number of bits in the filter
- k -> number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
A command line equivalent would be this practical bloom-filter-calculator.rb gist
Note that there are corner cases, for example, the formula that gives
bit_len
(m) from nb_items
(n) and fp_rate
(p), corresponds to
the optimal case, when nb_hash
(k) has been chosen optimally. If not,
it has to be revisited and adapted to the real value of nb_hash
(k).
In practice, unless you impose it, what this package does is enforce
a nb_hash
(k) of 2, which is generally optimal if you consider CPU time
and not memory usage.
§Links
- Bloom Filter on Wikipedia
- All About Bloom Filters
- Buffered Bloom Filters on Solid State Storage
- Age-Partitioned Bloom Filters
- countBF: A General-purpose High Accuracy and Space Efficient Counting Bloom Filter
- Stable Learned Bloom Filters for Data Streams
- Building a Better Bloom Filter
§Examples
Getting a filter for a given number of items, everything else default:
use ofilter::Params;
let params = Params::with_nb_items(10_000);
assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", ¶ms));
Getting a filter for a given number of items, false positive rate, and enforcing number of hash:
use ofilter::Params;
let params = Params{
nb_hash: 3,
bit_len: 0,
nb_items: 100_000,
fp_rate: 0.1,
predict: false,
}.adjust();
assert_eq!("{ nb_hash: 3, bit_len: 480833, nb_items: 100000, fp_rate: 0.100000, predict: false }", format!("{}", ¶ms));
Fields§
§nb_hash: usize
Number of hash functions used by the filter.
Also referred to as k
is most Bloom filter papers.
bit_len: usize
Length of the bit vector used by the filter.
Also referred to as m
is most Bloom filter papers.
nb_items: usize
Number of items the Bloom filter is designed for.
Also referred to as n
is most Bloom filter papers.
fp_rate: f64
False positive rate of the Bloom filter.
Also referred to as p
is most Bloom filter papers.
predict: bool
If set to true, Bloom filter is predictable.
Use this for testing, when you want something that is 100% predictable and avoid flaky behavior. In production, it would be safer to rely on the random, statistical default behavior.
One reason is security, among other examples, using a predictable hash for a cache may expose you to some sort of DDOS attack.
If in doubt, leave it to false.
Implementations§
source§impl Params
impl Params
sourcepub fn with_nb_items(nb_items: usize) -> Params
pub fn with_nb_items(nb_items: usize) -> Params
Parameters to store a given number of items.
All other values are using defaults, or are derive from this. This is the go-to function to create a reasonable filter that holds N items.
§Examples:
use ofilter::Params;
let p = Params::with_nb_items(10_000);
assert_eq!("{ nb_hash: 2, bit_len: 189825, nb_items: 10000, fp_rate: 0.010000, predict: false }", format!("{}", p));
sourcepub fn with_nb_items_and_fp_rate(nb_items: usize, fp_rate: f64) -> Params
pub fn with_nb_items_and_fp_rate(nb_items: usize, fp_rate: f64) -> Params
Parameters to store a given number of items, enforcing false positive rate.
By defining nb_items
and fp_rate
, the filter is almost completely defined.
The other missing parameter is the number of hash functions. There is an
optimal number for this, but this function will enforce a default of 2,
which most of the time yields excellent results, and avoids calculating
too many hashes.
If you want to use the optimal number of hashes, create the Params struct
with nb_items
, fp_rate
and nb_hash
defined, then call .adjust()
.
§Examples:
use ofilter::Params;
let p = Params::with_nb_items_and_fp_rate(10_000, 0.001);
assert_eq!("{ nb_hash: 2, bit_len: 622402, nb_items: 10000, fp_rate: 0.001000, predict: false }", format!("{}", p));
sourcepub fn with_bit_len(bit_len: usize) -> Params
pub fn with_bit_len(bit_len: usize) -> Params
Parameters to guess the number of items from bit buffer size.
If the only thing you care about, this will give you the best result for that amount of memory.
§Examples:
use ofilter::Params;
let p = Params::with_bit_len(8_192);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 432, fp_rate: 0.010019, predict: false }", format!("{}", p));
sourcepub fn with_bit_len_and_nb_items(bit_len: usize, nb_items: usize) -> Params
pub fn with_bit_len_and_nb_items(bit_len: usize, nb_items: usize) -> Params
Parameters to store a given number of items, enforcing bit buffer size
If the only thing you care about, this will give you the best result for that amount of memory.
§Examples:
use ofilter::Params;
let p = Params::with_bit_len_and_nb_items(8_192, 1_000);
assert_eq!("{ nb_hash: 2, bit_len: 8192, nb_items: 1000, fp_rate: 0.046925, predict: false }", format!("{}", p));
sourcepub fn estimate_nb_items(nb_hash: usize, bit_len: usize, fp_rate: f64) -> usize
pub fn estimate_nb_items(nb_hash: usize, bit_len: usize, fp_rate: f64) -> usize
Estimate the number of items one can store.
This applies the formula n = ceil(m / (-k / log(1 - exp(log(p) / k))))
.
§Examples:
use ofilter::Params;
assert_eq!(1994, Params::estimate_nb_items(5, 10_000, 0.1));
sourcepub fn estimate_fp_rate(nb_hash: usize, bit_len: usize, nb_items: usize) -> f64
pub fn estimate_fp_rate(nb_hash: usize, bit_len: usize, nb_items: usize) -> f64
Estimate the false positive rate.
This applies the formula p = powf(1 - exp(-k / (m / n)), k)
§Examples:
use ofilter::Params;
assert_eq!("0.009431", format!("{:0.6}", Params::estimate_fp_rate(5, 10_000, 1_000)));
sourcepub fn optimal_bit_len(nb_items: usize, fp_rate: f64) -> usize
pub fn optimal_bit_len(nb_items: usize, fp_rate: f64) -> usize
Calculate the optimal number of bits.
This applies the formula m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
Note that this is the optimal number of bits, but for the case
where the number of hash has also been chosen optimally. You’ll notice
the formula does not depend on fp_rate
(p).
See guess_bit_len_for_fp_rate
if you want to also take fp_rate
in account.
§Examples:
use ofilter::Params;
assert_eq!(62353, Params::optimal_bit_len(10_000, 0.05));
sourcepub fn optimal_nb_hash(bit_len: usize, nb_items: usize) -> usize
pub fn optimal_nb_hash(bit_len: usize, nb_items: usize) -> usize
Calculate the optimal number of hash.
This applies the formula k = round((m / n) * log(2))
Note that this is the optimal number of hash, but it does not
account for false positive rate, which you may want to impose.
You’ll notice the formula does not depend on fp_rate
(p).
See guess_bit_len_for_fp_rate
if you want to also take fp_rate
in account.
§Examples:
use ofilter::Params;
assert_eq!(13, Params::optimal_nb_hash(10_000, 500));
sourcepub fn guess_bit_len_for_fp_rate(
nb_hash: usize,
nb_items: usize,
fp_rate: f64
) -> usize
pub fn guess_bit_len_for_fp_rate( nb_hash: usize, nb_items: usize, fp_rate: f64 ) -> usize
Guess the number of bits for a given false positive rate.
There is a formula m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
to get this
result but it gives it for the optimal case. Sometimes you can
prefer to, typically, lower the number of bits in the bits buffer to
get something smaller in memory, and just give up a bit on the optimal
fp_rate
(p).
This function does a bisect search to find the correct value.
§Examples:
use ofilter::Params;
assert_eq!(480833, Params::guess_bit_len_for_fp_rate(3, 100_000, 0.1));
sourcepub fn guess_nb_hash_for_fp_rate(
bit_len: usize,
nb_items: usize,
fp_rate: f64
) -> usize
pub fn guess_nb_hash_for_fp_rate( bit_len: usize, nb_items: usize, fp_rate: f64 ) -> usize
Guess the number of hash for a given false positive rate.
There is a formula k = round((m / n) * log(2))
to get this
result but it gives it for the optimal case. Sometimes you can
prefer to, typically, lower the number of hash functions to
get something faster, and just give up a bit on the optimal
fp_rate
(p).
This function starts with the optimal nb_hash
as a seed,
and then tries to lower the it as long as the false positive rate
gets closer to the target fp_rate
.
§Examples:
use ofilter::Params;
assert_eq!(6, Params::guess_nb_hash_for_fp_rate(100_000, 10_000, 0.0005));
sourcepub fn adjust(self) -> Self
pub fn adjust(self) -> Self
Adjust params so that they are consistent.
The 4 parameters in a Bloom filter are closely linked. More precisely, with:
- n (
nb_items
) -> number of items in the filter - p (
fp_rate
) -> probability of false positives, fraction between 0 and 1 - m (
bit_len
) -> number of bits in the filter - k (
nb_hash
) -> number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
This function works as follows:
- if any parameter is non-zero, tries to respect it
- if any parameter is zero, and can be inferred from others, calculate it
- for all other parameters, use defaults
Once everything has a value, it will ultimately re-calculate the fp_rate
so that it matches the other 3, which are integers, so may not be changed
linearily.
In practice, use it to feed the new_with_params
constructors for filters.
§Examples:
use ofilter::Params;
let params = Params{
nb_hash: 2,
bit_len: 0,
nb_items: 1_000,
fp_rate: 0.1,
predict: false,
};
assert_eq!("{ nb_hash: 2, bit_len: 0, nb_items: 1000, fp_rate: 0.100000, predict: false }",
format!("{}", params)
);
assert_eq!("{ nb_hash: 2, bit_len: 5262, nb_items: 1000, fp_rate: 0.099980, predict: false }",
format!("{}", params.adjust())
);
Trait Implementations§
source§impl<'de> Deserialize<'de> for Params
impl<'de> Deserialize<'de> for Params
source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
source§impl Display for Params
impl Display for Params
Pretty-print parameters
§Examples
use ofilter::Params;
let p = Params::with_nb_items(100);
assert_eq!("{ nb_hash: 2, bit_len: 1899, nb_items: 100, fp_rate: 0.009992, predict: false }", format!("{}", p));
source§impl PartialEq for Params
impl PartialEq for Params
Compare two parameters sets.
The false positive rate is rounded to 6 digits.
§Examples
use ofilter::Params;
let p1 = Params::with_nb_items(100);
let mut p2 = Params::with_nb_items(100);
p2.fp_rate += 1e-8;
assert_eq!(p1, p2);
impl Eq for Params
Auto Trait Implementations§
impl Freeze for Params
impl RefUnwindSafe for Params
impl Send for Params
impl Sync for Params
impl Unpin for Params
impl UnwindSafe for Params
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> FmtForward for T
impl<T> FmtForward for T
source§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self
to use its Binary
implementation when Debug
-formatted.source§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self
to use its Display
implementation when
Debug
-formatted.source§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self
to use its LowerExp
implementation when
Debug
-formatted.source§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self
to use its LowerHex
implementation when
Debug
-formatted.source§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self
to use its Octal
implementation when Debug
-formatted.source§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self
to use its Pointer
implementation when
Debug
-formatted.source§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self
to use its UpperExp
implementation when
Debug
-formatted.source§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self
to use its UpperHex
implementation when
Debug
-formatted.source§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
source§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
source§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self
and passes that borrow into the pipe function. Read moresource§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self
and passes that borrow into the pipe function. Read moresource§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
source§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R ) -> R
source§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self
, then passes self.as_ref()
into the pipe function.source§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self
, then passes self.as_mut()
into the pipe
function.source§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self
, then passes self.deref()
into the pipe function.source§impl<T> Tap for T
impl<T> Tap for T
source§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B>
of a value. Read moresource§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B>
of a value. Read moresource§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R>
view of a value. Read moresource§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R>
view of a value. Read moresource§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target
of a value. Read moresource§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target
of a value. Read moresource§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap()
only in debug builds, and is erased in release builds.source§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut()
only in debug builds, and is erased in release
builds.source§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow()
only in debug builds, and is erased in release
builds.source§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut()
only in debug builds, and is erased in release
builds.source§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref()
only in debug builds, and is erased in release
builds.source§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut()
only in debug builds, and is erased in release
builds.source§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref()
only in debug builds, and is erased in release
builds.