Expand description
This library implements Xor Filters and their derivatives. Xor filters are data structures for fast approximation of set membership using little memory. Probabilistic filters like xor filters are useful for quickly estimating of the existence of an entity to avoid using an expensive resource. For example, they can be used to reduce disk writes in a cache or identify malicious URLs.
Xor filters are faster and smaller than Bloom and Cuckoo filters. Xor filters incur a relative time penalty in construction, but are very fast in lookups; the expectation is that construction of a filter is amortized after many queries.
Xor filters operate only on sets of 64-bit (unsigned) integers. This library does not provide
methods for hashing arbitrary types to 64-bit integers. Xor filters are immutable,
serializable, and guarantee no false negatives. This library is no_std
and needs_allocator
.
Filters are implemented as described in the paper “Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters”, in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”, and in Daniel Lemire’s go and c implementations. All are useful references on the theory behind xor filters. For performance statistics, refer to individual filters’ documentation in this crate or the mentioned papers.
General considerations for all filters
It is highly recommended to use the BinaryFuse
family of xor-like filters. BinaryFuse
filters can fail to construct, but almost always only if they are constructed with duplicate
keys. If you need a filter that will unconditionally succeed in construction (up to duplicate
keys), use the Xor
family of filters.
For a given N
, a BinaryFuseN
and FuseN
filter are roughly equivalent in size, uniformity
of key distribution, and false-positive rate.
An XorN
filter is larger, less uniform, and has a higher false-positive
rate than both a BinaryFuseN
and FuseN
filter.
A BinaryFuseN
filter’s construction is faster, uses less memory, and is more likely to
succeed compared to a FuseN
filter’s construction.
The false-positive rate of a filter with fingerprint size N
is around 2^{-N}
; for more
numbers, see the documentation of each individual filter.
Assumed pre-conditions
- It is a pre-condition that all filters are constructed from a data structure containing no duplicate keys. You must perform any de-duplication needed yourself before constructing a filter.
FAQ
What’s the difference between “Fuse” and “Binary Fuse” filters?
Fuse filters use a fuse graph to reduce the space required to hold fingerprints. Binary Fuse filters further exploit fuse graphs in a novel manner described in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”. In particular, Binary Fuse filters use a binary-partitioned fuse graph, and are different enough from the “original” Fuse filters to deserve a unique name.
Structs
- A
BinaryFuse8
filter is an Xor-like filter with 8-bit fingerprints arranged in a binary-partitioned fuse graph.BinaryFuse8
s are similar toFuse8
s, but their construction is faster, uses less memory, and is more likely to succeed. - A
BinaryFuse16
filter is an Xor-like filter with 16-bit fingerprints arranged in a binary-partitioned fuse graph.BinaryFuse16
s are similar toFuse16
s, but their construction is faster, uses less memory, and is more likely to succeed. - A
BinaryFuse32
filter is an Xor-like filter with 32-bit fingerprints arranged in a binary-partitioned fuse graph.BinaryFuse32
s are similar toFuse32
s, but their construction is faster, uses less memory, and is more likely to succeed. - Fuse8DeprecatedXor filter using 8-bit fingerprints in a fuse graph. Requires less space than an
Xor8
. - Fuse16DeprecatedXor filter using 16-bit fingerprints in a fuse graph. Requires less space than an
Xor16
. - Fuse32DeprecatedXor filter using 32-bit fingerprints in a fuse graph. Requires less space than an
Xor32
. - Arbitrary key type proxy for xor filters.
- Xor filter using 8-bit fingerprints.
- Xor filter using 16-bit fingerprints.
- Xor filter using 32-bit fingerprints.
Traits
- Methods common to xor filters.