[][src]Struct pdatastructs::filters::quotientfilter::QuotientFilter

pub struct QuotientFilter<T, B = BuildHasherDefault<DefaultHasher>> where
    T: Hash,
    B: BuildHasher + Clone + Eq
{ /* fields omitted */ }

A QuotientFilter is a set-like data structure, that keeps track of elements it has seen without the need to store them. Looking up values has a certain false positive rate, but a false negative rate of 0%.

Examples

use pdatastructs::filters::Filter;
use pdatastructs::filters::quotientfilter::QuotientFilter;

// set up filter
let bits_quotient = 16;
let bits_remainder = 5;
let mut filter = QuotientFilter::with_params(bits_quotient, bits_remainder);

// add some data
filter.insert(&"my super long string").unwrap();

// later
assert!(filter.query(&"my super long string"));
assert!(!filter.query(&"another super long string"));

Applications

  • when a lot of data should be added to the set and a moderate false positive rate is acceptable, was used for spell checking
  • as a pre-filter for more expensive lookups, e.g. in combination with a real set, map or database, so the final false positive rate is 0%

How It Works

Setup

There are 2^bits_quotient slots, initial empty. For every slot, we store bits_remainder as fingerprint information, a is_continuation bit, a is_occupied bit and a is_shifted bit. All bits are initially set to false.

bits_quotient  = 3
bits_remainder = 4

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
|  position       ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |     |     |     |     |     |     |
| is_continuation ||     |     |     |     |     |     |     |     |
| is_shifted      ||     |     |     |     |     |     |     |     |
| remainder       || 0x0 | 0x0 | 0x0 | 0x0 | 0x0 | 0x0 | 0x0 | 0x0 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+

Insertion

On insertion, elements are hashed to 64 bits. From these, bits_quotient are used as a quotient and bits_remainder are used as remainder, the remaining bits are dropped.

The quotient represents the canonical position in which the remainder should be inserted. If is is free, we use that position, set the is_occupied bit and are done.

x           = "foo"
h(x)        = 0x0123456789abcda5
h(x) & 0x7f = 0x25
remainder   = 0x5
quotient    = 0x2

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |     |     |     |     |     |
| is_continuation ||     |     |     |     |     |     |     |     |
| is_shifted      ||     |     |     |     |     |     |     |     |
| remainder       || 0x0 | 0x0 | 0x2 | 0x0 | 0x0 | 0x0 | 0x0 | 0x0 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+

If not, linear probing is applied. If an element with the same quotient is already in the filter, the so called "run" of it will be extended. For extensions, the is_continuation bit is set as well as the is_shifted bit because the stored remainder is not in its canonical position:

x           = "bar"
h(x)        = 0xad8caa00248af32e
h(x) & 0x7f = 0x2e
remainder   = 0xe
quotient    = 0x2

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |     |     |     |     |     |
| is_continuation ||     |     |     |   X |     |     |     |     |
| is_shifted      ||     |     |     |   X |     |     |     |     |
| remainder       || 0x0 | 0x0 | 0x2 | 0xe | 0x0 | 0x0 | 0x0 | 0x0 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| run             ||            [=========]                        |
+-----------------++-----------------------------------------------|

While doing so, the order of remainders within the run is preserved:

x           = "elephant"
h(x)        = 0x34235511eeadbc26
h(x) & 0x7f = 0x26
remainder   = 0x6
quotient    = 0x2

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |     |     |     |     |     |
| is_continuation ||     |     |     |   X |   X |     |     |     |
| is_shifted      ||     |     |     |   X |   X |     |     |     |
| remainder       || 0x0 | 0x0 | 0x2 | 0x6 | 0xe | 0x0 | 0x0 | 0x0 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| run             ||            [===============]                  |
+-----------------++-----------------------------------------------|

If a new quotient is inserted but the corresponding run cannot start at the canonical position, the entire run will be shifted. A sequence of runs is also called "cluster". Even though the run is shifted, the original position will still be marked as occupied:

x           = "banana"
h(x)        = 0xdfdfdfdfdfdfdf31
h(x) & 0x7f = 0x31
remainder   = 0x1
quotient    = 0x3

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |   X |     |     |     |     |
| is_continuation ||     |     |     |   X |   X |     |     |     |
| is_shifted      ||     |     |     |   X |   X |   X |     |     |
| remainder       || 0x0 | 0x0 | 0x2 | 0x6 | 0xe | 0x1 | 0x0 | 0x0 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| run             ||            [===============] [===]            |
| cluster         ||            [=====================]            |
+-----------------++-----------------------------------------------|

Remainders may duplicate over multiple runs:

x           = "apple"
h(x)        = 0x0000000000000072
h(x) & 0x7f = 0x72
remainder   = 0x2
quotient    = 0x7

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |   X |     |     |     |   X |
| is_continuation ||     |     |     |   X |   X |     |     |     |
| is_shifted      ||     |     |     |   X |   X |   X |     |     |
| remainder       || 0x0 | 0x0 | 0x2 | 0x6 | 0xe | 0x1 | 0x0 | 0x2 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| run             ||            [===============] [===]       [===]|
| cluster         ||            [=====================]       [===]|
+-----------------++-----------------------------------------------|

The entire array works like a ring-buffer and operations can over- and underflow:

x           = "last"
h(x)        = 0x11355343431323f3
h(x) & 0x7f = 0x73
remainder   = 0x3
quotient    = 0x7

+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| position        ||   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| is_occupied     ||     |     |   X |   X |     |     |     |   X |
| is_continuation ||   X |     |     |   X |   X |     |     |     |
| is_shifted      ||   X |     |     |   X |   X |   X |     |     |
| remainder       || 0x3 | 0x0 | 0x2 | 0x6 | 0xe | 0x1 | 0x0 | 0x2 |
+-----------------++-----+-----+-----+-----+-----+-----+-----+-----+
| run             ||====]       [===============] [===]       [====|
| cluster         ||====]       [=====================]       [====|
+-----------------++-----------------------------------------------|

Lookup

The lookup basically follows the insertion procedure.

See Also

  • std::collections::HashSet: has a false positive rate of 0%, but also needs to store all elements

References

Methods

impl<T> QuotientFilter<T> where
    T: Hash
[src]

Create new quotient filter with:

  • bits_quotient: number of bits used for a quotient, aka 2^bits_quotient slots will be allocated
  • bits_remainder: number of bits used for the remainder, so every slot will require bits_remainder + 3 bits of storage

and a default hasher.

impl<T, B> QuotientFilter<T, B> where
    T: Hash,
    B: BuildHasher + Clone + Eq
[src]

Create new quotient filter with:

  • bits_quotient: number of bits used for a quotient, aka 2^bits_quotient slots will be allocated
  • bits_remainder: number of bits used for the remainder, so every slot will require bits_remainder + 3 bits of storage
  • buildhasher: hash implementation

Number of bits used for addressing slots.

Number of bits stored as fingeprint information.

Trait Implementations

impl<T, B> Filter<T> for QuotientFilter<T, B> where
    T: Hash,
    B: BuildHasher + Clone + Eq
[src]

Error type that may occur during insertion.

impl<T: Clone, B: Clone> Clone for QuotientFilter<T, B> where
    T: Hash,
    B: BuildHasher + Clone + Eq
[src]

Performs copy-assignment from source. Read more

impl<T: Debug, B: Debug> Debug for QuotientFilter<T, B> where
    T: Hash,
    B: BuildHasher + Clone + Eq
[src]

Auto Trait Implementations

impl<T, B> Send for QuotientFilter<T, B> where
    B: Send,
    T: Send

impl<T, B> Sync for QuotientFilter<T, B> where
    B: Sync,
    T: Sync

Blanket Implementations

impl<T> From for T
[src]

impl<T, U> Into for T where
    U: From<T>, 
[src]

impl<T> ToOwned for T where
    T: Clone
[src]

impl<T, U> TryFrom for T where
    T: From<U>, 
[src]

🔬 This is a nightly-only experimental API. (try_from)

The type returned in the event of a conversion error.

impl<T> Borrow for T where
    T: ?Sized
[src]

impl<T> BorrowMut for T where
    T: ?Sized
[src]

impl<T, U> TryInto for T where
    U: TryFrom<T>, 
[src]

🔬 This is a nightly-only experimental API. (try_from)

The type returned in the event of a conversion error.

impl<T> Any for T where
    T: 'static + ?Sized
[src]