b100m-filter 0.2.0

Bloom Filter for Rust
Documentation

b100m-filter

Crates.io docs.rs License: MIT License: APACHE

A very fast and accurate Bloom Filter implementation in Rust.

Usage

# Cargo.toml

[dependencies]

b100m-filter = "0.2.0"

use b100m_filter::BloomFilter;

let num_blocks = 4; // by default, each block is 512 bits
let values = vec!["42", "qwerty", "bloom"];

let filter = BloomFilter::builder(num_blocks).items(values.iter());
assert!(filter.contains("42"));
assert!(filter.contains("bloom"));

Implementation

b100m-filter is blazingly fast because it uses L1 cache friendly blocks and efficiently derives many index bits from only one hash per value. Compared to traditional implementations, b100m-filter is 5-13 times faster for a small number of contained items, and hundreds of times faster for more items. In all cases, b100m-filter maintains competitive false positive rates.

Runtime Performance

Runtime comparison to a popular traditonal bloom filter crate:

Sampled False Postive Rate: 0.000000%

BloomFilter (1000 items, 65536 bytes): get existing 1000
  time:   [72.823 µs 72.856 µs 72.893 µs]
+ change: [-92.273% -92.252% -92.210%] (p = 0.00 < 0.05)

BloomFilter (1000 items, 65536 bytes): get non-existing 1000
  time:   [14.631 µs 14.826 µs 15.051 µs]
+ change: [-80.346% -80.186% -79.991%] (p = 0.00 < 0.05)

BloomFilter (1000 items, 2097152 bytes): get existing 1000
  time:   [75.010 µs 75.439 µs 75.902 µs]
+ change: [-99.832% -99.832% -99.831%] (p = 0.00 < 0.05)

BloomFilter (1000 items, 2097152 bytes): get non-existing 1000
  time:   [14.532 µs 14.560 µs 14.591 µs]
+ change: [-99.470% -99.468% -99.467%] (p = 0.00 < 0.05)

As the memory size and set size increase, traditional bloom filters need to perform more hashes to keep false positive rates low. Bloom filter speed is directly proportional to number of hashes. However, b100m-filter's optimal number of hashes is bounded while keeping near zero rates even for many items:

bloom_perf

False Positive Performance

b100m-filter does not sacrifice accuracy. Below is a comparison false positive rate with a traditional bloom filter:

bloom_fp

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.