1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
//! This library implements Xor Filters -- data structures for fast approximation of set //! membership using little memory. Probabilistic filters like xor filters are useful for //! quickly estimating of the existence of an entity to avoid using an expensive resource. //! For example, they can be used to [reduce disk writes] in a cache or [identify malicious URLs]. //! //! Xor filters are faster and smaller than Bloom and Cuckoo filters. //! Xor filters incur a relative time penalty in construction, but are very fast in lookups; the //! expectation is that construction of a filter is amortized after many queries. //! //! Xor filters operate only on sets of 64-bit (unsigned) integers. This library does not provide //! methods for hashing arbitrary types to 64-bit integers. Xor filters are immutable, //! serializable, and guarantee no false negatives. This library is `no_std` and [`needs_allocator`]. //! //! Filters are implemented as described in the paper //! [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters] and in Daniel Lemire's [go] and //! [c] implementations. All are useful references on the theory behind xor filters. For //! performance statistics, refer to individual filters' documentation or the mentioned //! paper. //! //! ## General considerations for all filters //! //! For a given `N`, `XorN` filters are larger and less uniform than `FuseN` filters, but `XorN` are //! guaranteed to be constructable while `FuseN` filters require a large number of keys for //! construction to succeed. `N` refers to the bit size of a fingerprint (the //! result of a hash function) in the filter. Filters with larger fingerprints trade off //! lower false-positive rates for greater filter sizes. //! //! The false-positive rate of a filter with fingerprint size `N` is around `2^{-N}`; for more //! numbers, see the documentation of each individual filter. //! //! Below, we list important constraints to keep in mind regardless of the filter used: //! //! - It is a pre-condition that all filters are constructed from a data structure containing no //! duplicate keys. You must perform any de-duplication needed yourself before constructing a //! filter. //! //! [reduce disk writes]: https://en.wikipedia.org/wiki/Bloom_filter#Cache_filtering //! [identify malicious URLs]: https://en.wikipedia.org/wiki/Bloom_filter#Examples //! [`needs_allocator`]: https://doc.rust-lang.org/1.9.0/book/custom-allocators.html //! [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters]: https://arxiv.org/abs/1912.08258 //! [go]: https://github.com/FastFilter/xorfilter //! [c]: https://github.com/FastFilter/xor_singleheader #![no_std] #![cfg_attr(feature = "nightly", feature(allocator_internals), needs_allocator)] #![warn(missing_docs)] #![forbid(clippy::all, clippy::cargo, clippy::nursery)] #![allow( clippy::len_without_is_empty, clippy::useless_attribute, clippy::multiple_crate_versions, clippy::fallible_impl_from )] #[macro_use] extern crate alloc; mod murmur3; mod prelude; mod splitmix64; mod fuse16; mod fuse32; mod fuse8; mod hash_proxy; mod xor16; mod xor32; mod xor8; pub use fuse16::Fuse16; pub use fuse32::Fuse32; pub use fuse8::Fuse8; pub use hash_proxy::HashProxy; pub use xor16::Xor16; pub use xor32::Xor32; pub use xor8::Xor8; /// Methods common to xor filters. pub trait Filter<Type> { /// Returns `true` if the filter probably contains the specified key. /// /// There can never be a false negative, but there is a small possibility of false positives. /// Refer to individual filters' documentation for false positive rates. fn contains(&self, key: &Type) -> bool; /// Returns the number of fingerprints in the filter. fn len(&self) -> usize; }