Crate xorf

source ·
Expand description

This library implements Xor Filters and their derivatives. Xor filters are data structures for fast approximation of set membership using little memory. Probabilistic filters like xor filters are useful for quickly estimating of the existence of an entity to avoid using an expensive resource. For example, they can be used to reduce disk writes in a cache or identify malicious URLs.

Xor filters are faster and smaller than Bloom and Cuckoo filters. Xor filters incur a relative time penalty in construction, but are very fast in lookups; the expectation is that construction of a filter is amortized after many queries.

Xor filters operate only on sets of 64-bit (unsigned) integers. This library does not provide methods for hashing arbitrary types to 64-bit integers. Xor filters are immutable, serializable, and guarantee no false negatives. This library is no_std and needs_allocator.

Filters are implemented as described in the paper “Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters”, in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”, and in Daniel Lemire’s go and c implementations. All are useful references on the theory behind xor filters. For performance statistics, refer to individual filters’ documentation in this crate or the mentioned papers.

General considerations for all filters

It is highly recommended to use the BinaryFuse family of xor-like filters. BinaryFuse filters can fail to construct, but almost always only if they are constructed with duplicate keys. If you need a filter that will unconditionally succeed in construction (up to duplicate keys), use the Xor family of filters.

For a given N, a BinaryFuseN and FuseN filter are roughly equivalent in size, uniformity of key distribution, and false-positive rate.

An XorN filter is larger, less uniform, and has a higher false-positive rate than both a BinaryFuseN and FuseN filter.

A BinaryFuseN filter’s construction is faster, uses less memory, and is more likely to succeed compared to a FuseN filter’s construction.

The false-positive rate of a filter with fingerprint size N is around 2^{-N}; for more numbers, see the documentation of each individual filter.

Assumed pre-conditions

  • It is a pre-condition that all filters are constructed from a data structure containing no duplicate keys. You must perform any de-duplication needed yourself before constructing a filter.

FAQ

What’s the difference between “Fuse” and “Binary Fuse” filters?

Fuse filters use a fuse graph to reduce the space required to hold fingerprints. Binary Fuse filters further exploit fuse graphs in a novel manner described in the upcoming “Binary Fuse Filters: Fast and Smaller Than Xor Filters”. In particular, Binary Fuse filters use a binary-partitioned fuse graph, and are different enough from the “original” Fuse filters to deserve a unique name.

Structs

  • A BinaryFuse8 filter is an Xor-like filter with 8-bit fingerprints arranged in a binary-partitioned fuse graph. BinaryFuse8s are similar to Fuse8s, but their construction is faster, uses less memory, and is more likely to succeed.
  • A BinaryFuse16 filter is an Xor-like filter with 16-bit fingerprints arranged in a binary-partitioned fuse graph. BinaryFuse16s are similar to Fuse16s, but their construction is faster, uses less memory, and is more likely to succeed.
  • A BinaryFuse32 filter is an Xor-like filter with 32-bit fingerprints arranged in a binary-partitioned fuse graph. BinaryFuse32s are similar to Fuse32s, but their construction is faster, uses less memory, and is more likely to succeed.
  • Fuse8Deprecated
    Xor filter using 8-bit fingerprints in a fuse graph. Requires less space than an Xor8.
  • Fuse16Deprecated
    Xor filter using 16-bit fingerprints in a fuse graph. Requires less space than an Xor16.
  • Fuse32Deprecated
    Xor filter using 32-bit fingerprints in a fuse graph. Requires less space than an Xor32.
  • Arbitrary key type proxy for xor filters.
  • Xor filter using 8-bit fingerprints.
  • Xor filter using 16-bit fingerprints.
  • Xor filter using 32-bit fingerprints.

Traits

  • Methods common to xor filters.