FilterSize

Enum FilterSize 

Source
pub enum FilterSize {
    KeyBytes1 = 1,
    KeyBytes2 = 2,
    KeyBytes3 = 3,
    KeyBytes4 = 4,
    KeyBytes5 = 5,
}
Expand description

FilterSize bounds the allocated size and false-positive rate of a Bloom2 instance.

The false positive probability for a bloom filter increases as the number of entries increases. This relationship is demonstrated using 64bit hashes as keys for each possible filter configuration below - you should choose a filter size for your expected load level and hash size.

The value of FilterSize controls the k property of the filter: k = input_length_bytes / FilterSize.

Variants§

§

KeyBytes1 = 1

1 byte / 8 bits per key results in a bloom filter with a minimum memory usage of ~4 bytes and a maximum memory usage of 36 bytes.

The false positive probability using k=1 (a single byte key per entry) grows proportionally to the number of entries in the filter:

          +--+------------+------------+-----------+------------+------------+-----+
        1 +                                                *                   *   +
          |                                                                        |
          |                                   *                                    |
    P 0.8 +                                                                        +
    r     |                                                                        |
    o     |                                                                        |
    b 0.6 +                         *                                              +
    a     |                                                                        |
    b     |                                                                        |
    i 0.4 +                                                                        +
    l     |                  *                                                     |
    i     |                                                                        |
    t 0.2 +                                                                        +
    y     |                                                                        |
          |              *                                                         |
        0 +  ***** * *                                                             +
          +--+------------+------------+-----------+------------+------------+-----+
             0           50           100         150          200          250
                                      Number of Entries

The probability of false positives reaches 1-in-2 after 80 entries.

An empty sparse bloom filter would require 1x64 bit block map entries (8 bytes) to map 4 64 bit blocks, containing a total of 256 bits (memory saving: 75%)

§

KeyBytes2 = 2

2 bytes / 16 bits per key results in a bloom filter with a minimum memory usage of ~1024 bytes and a maximum memory usage of ~8KB when fully populated.

When using a 64bit hash (4x2 byte keys, k=4) the probability of a false positive is:

          +--+------------------------+-----------------------+--------------------+
        1 +                                                *                  *    +
          |                                  *                                     |
          |                                                                        |
    P 0.8 +                         *                                              +
    r     |                                                                        |
    o     |                                                                        |
    b 0.6 +                                                                        +
    a     |                  *                                                     |
    b     |                                                                        |
    i 0.4 +                                                                        +
    l     |              *                                                         |
    i     |                                                                        |
    t 0.2 +                                                                        +
    y     |          *                                                             |
          |        *                                                               |
        0 +  *****                                                                 +
          +--+------------------------+-----------------------+--------------------+
             0                      50000                   1e+05
                                      Number of Entries

The probability of false positives reaches 1-in-2 after 30118 entries.

An empty sparse bloom filter would require 16x64 bit block map entries (128 bytes) to map 1024 64 bit blocks, containing a total of 65536 bits (memory saving: 98.4375%)

§

KeyBytes3 = 3

3 bytes / 24 bits per key results in a bloom filter with a minimum memory usage of ~262KB bytes and a maximum memory usage of ~2MB when fully populated.

When using a 64bit hash (2x3 byte keys, k=2) the probability of a false positive is:

        1 +--+------------------+-------------------+------------------+-----------+
          |                                                                   *    |
          |                                                *                       |
      0.8 +                                                                        +
    P     |                                  *                                     |
    r     |                                                                        |
    o     |                                                                        |
    b 0.6 +                         *                                              +
    a     |                                                                        |
    b     |                                                                        |
    i 0.4 +                  *                                                     +
    l     |                                                                        |
    i     |              *                                                         |
    t 0.2 +                                                                        +
    y     |          *                                                             |
          |      * *                                                               |
        0 +  ****                                                                  +
          +--+------------------+-------------------+------------------+-----------+
             0                1e+07               2e+07              3e+07
                                      Number of Entries

The probability of false positives reaches 1-in-2 after 10300768 entries.

An empty sparse bloom filter would require 4096x64 bit block map entries (32768 bytes) to map 262144 64 bit blocks, containing a total of 16777216 bits (memory saving: 98.4375%)

§

KeyBytes4 = 4

4 bytes / 32 bits per key results in a bloom filter with a minimum memory usage of ~67MB and a maximum memory usage of ~603MB when fully populated.

When using a 64bit hash (2x4 byte keys, k=2) the probability of a false positive is:

        1 +--+--------------+--------------+--------------+--------------+---------+
          |                                                                   *    |
          |                                                *                       |
      0.8 +                                                                        +
    P     |                                  *                                     |
    r     |                                                                        |
    o     |                                                                        |
    b 0.6 +                         *                                              +
    a     |                                                                        |
    b     |                                                                        |
    i 0.4 +                  *                                                     +
    l     |                                                                        |
    i     |              *                                                         |
    t 0.2 +                                                                        +
    y     |          *                                                             |
          |      * *                                                               |
        0 +  ****                                                                  +
          +--+--------------+--------------+--------------+--------------+---------+
             0            2e+09          4e+09          6e+09          8e+09
                                      Number of Entries

The probability of false positives reaches 1-in-2 after 2636996484 entries.

An empty sparse bloom filter would require 1048576x64 bit block map entries (8388608 bytes) to map 67108864 64 bit blocks, containing a total of 4294967296 bits (memory saving: 98.4375%)

§

KeyBytes5 = 5

5 bytes / 32 bits per key results in a bloom filter with a minimum memory usage of ~17GB and a maximum memory usage of ~1117GB when fully populated.

If you actually need this get in touch - I have some ideas for reducing the memory footprint even further.

When using a 64bit hash (1x5 byte keys, k=1) the probability of a false positive is:

          +--+----------+---------+---------+----------+---------+---------+-------+
        1 +                                                *                  *    +
          |                                  *                                     |
          |                         *                                              |
    P 0.8 +                                                                        +
    r     |                  *                                                     |
    o     |              *                                                         |
    b 0.6 +                                                                        +
    a     |          *                                                             |
    b     |                                                                        |
    i 0.4 +        *                                                               +
    l     |                                                                        |
    i     |      *                                                                 |
    t 0.2 +     *                                                                  +
    y     |    *                                                                   |
          |   *                                                                    |
        0 +  **                                                                    +
          +--+----------+---------+---------+----------+---------+---------+-------+
             0        1e+12     2e+12     3e+12      4e+12     5e+12     6e+12
                                      Number of Entries

The probability of false positives reaches 1-in-2 after 762123384786 entries.

An empty sparse bloom filter would require 268435456x64 bit block map entries (2147483648 bytes) to map 17179869184 64 bit blocks, containing a total of 1099511627776 bits (memory saving: 98.4375%)

Trait Implementations§

Source§

impl Clone for FilterSize

Source§

fn clone(&self) -> FilterSize

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FilterSize

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for FilterSize

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for FilterSize

Source§

fn eq(&self, other: &FilterSize) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for FilterSize

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Copy for FilterSize

Source§

impl Eq for FilterSize

Source§

impl StructuralPartialEq for FilterSize

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,