Expand description
This library implements Xor Filters and their derivatives. Xor filters are data structures for fast approximation of set membership using little memory. Probabilistic filters like xor filters are useful for quickly estimating of the existence of an entity to avoid using an expensive resource. For example, they can be used to reduce disk writes in a cache or identify malicious URLs.
Xor filters are faster and smaller than Bloom and Cuckoo filters. Xor filters incur a relative time penalty in construction, but are very fast in lookups; the expectation is that construction of a filter is amortized after many queries.
Xor filters operate only on sets of 64-bit (unsigned) integers. This library does not provide
methods for hashing arbitrary types to 64-bit integers. Xor filters are immutable,
serializable, and guarantee no false negatives. This library is no_std and needs_allocator.
Filters are implemented as described in the paper “Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters”, in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”, and in Daniel Lemire’s go and c implementations. All are useful references on the theory behind xor filters. For performance statistics, refer to individual filters’ documentation in this crate or the mentioned papers.
§General considerations for all filters
It is highly recommended to use the BinaryFuse family of xor-like filters. BinaryFuse
filters can fail to construct, but almost always only if they are constructed with duplicate
keys. If you need a filter that will unconditionally succeed in construction (up to duplicate
keys), use the Xor family of filters.
For a given N, a BinaryFuseN and FuseN filter are roughly equivalent in size, uniformity
of key distribution, and false-positive rate.
An XorN filter is larger, less uniform, and has a higher false-positive
rate than both a BinaryFuseN and FuseN filter.
A BinaryFuseN filter’s construction is faster, uses less memory, and is more likely to
succeed compared to a FuseN filter’s construction.
The false-positive rate of a filter with fingerprint size N is around 2^{-N}; for more
numbers, see the documentation of each individual filter.
§Assumed pre-conditions
- It is a pre-condition that all filters are constructed from a data structure containing no duplicate keys. You must perform any de-duplication needed yourself before constructing a filter.
§FAQ
§What’s the difference between “Fuse” and “Binary Fuse” filters?
Fuse filters use a fuse graph to reduce the space required to hold fingerprints. Binary Fuse filters further exploit fuse graphs in a novel manner described in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”. In particular, Binary Fuse filters use a binary-partitioned fuse graph, and are different enough from the “original” Fuse filters to deserve a unique name.
Structs§
- Binary
Fuse8 - A
BinaryFuse8filter is an Xor-like filter with 8-bit fingerprints arranged in a binary-partitioned fuse graph. - Binary
Fuse8 Ref - Like
BinaryFuse8except that it can be constructed 0-copy from external buffers. - Binary
Fuse16 - A
BinaryFuse16filter is an Xor-like filter with 16-bit fingerprints arranged in a binary-partitioned fuse graph. - Binary
Fuse32 - A
BinaryFuse32filter is an Xor-like filter with 32-bit fingerprints arranged in a binary-partitioned fuse graph. - Binary
Fuse16 Ref - Like
BinaryFuse16except that it can be constructed 0-copy from external buffers. - Binary
Fuse32 Ref - Like
BinaryFuse32except that it can be constructed 0-copy from external buffers. - Fuse8
Deprecated - Xor filter using 8-bit fingerprints in a fuse graph. Requires less space than an
Xor8. - Fuse16
Deprecated - Xor filter using 16-bit fingerprints in a fuse graph. Requires less space than an
Xor16. - Fuse32
Deprecated - Xor filter using 32-bit fingerprints in a fuse graph. Requires less space than an
Xor32. - Hash
Proxy - Arbitrary key type proxy for xor filters.
- Xor8
- Xor filter using 8-bit fingerprints.
- Xor16
- Xor filter using 16-bit fingerprints.
- Xor32
- Xor filter using 32-bit fingerprints.
Traits§
- DmaSerializable
- DMA serializable filters are ones who can be essentially directly accessed into/out of DMA buffers.
- Filter
- Methods common to xor filters.
- Filter
Ref - Equivalent to Filter except represents a reference to fingerprints stored elsewhere.