OFilter
OFilter is an opinionated Bloom filter implemented in Rust.
It implements:
- a basic Bloom filter
- a thread-safe Bloom filter
- a streaming Bloom filter
- a thread-safe streaming Bloom filter
The basic Bloom filter is inspired from the existing and stable bloomfilter crate but does not directly depend on it. The API is slightly different, and makes a few opinionated changes.
- it has a fancy parameters support so that you can define a filter by its number of items, false positive ratio, size in bits, number of hash, or any combination of them. It figures out the magic.
- it uses the bitlen crate internally and exports its internal bit vector in that format.
- it uses the siphasher crate as the internal hashing func. It is not possible to override this.
- it introduces that notion of "streaming filter" which can be useful when you do not work in batches and never know when items are going to stop coming in. As some point it is a naive implementation of an aging Bloom filter, though I prefer the term streaming.
- performance-wise, the default settings tend to optimize for less CPU usage but more memory footprint. This can be controlled in options but by default the filter can more memory than strictly required.
In practice, I am using this filter to implement a self-expiring KV store.

Status
For now this is a toy project, clearly NOT suitable for production use.
https://en.wikipedia.org/wiki/Bloom_filter
Usage
use Bloom;
let mut filter: = new;
assert!;
filter.set;
assert!;
A bit of theory
This interactive Bloom filter params calulator proved very useful while developping this.
Also it recaps the usual formulas:
With:
n: number of items in the filter
p: probability of false positives, fraction between 0 and 1
m: number of bits in the filter
k: number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))));
k = round((m / n) * log(2));
A command line equivalent would be this practical bloom-filter-calculator.rb gist
# Optimal bloom filter size and number of hashes
# Tips:
# 1. One byte per item in the input set gives about a 2% false positive rate.
# 2. The optimal number of hash functions is ~0.7x the number of bits per item.
# 3. The number of hashes dominates performance.
# Expected number of items in the collection
# n = (m * ln(2))/k;
n = 300_000
# Acceptable false-positive rate (0.01 = 1%)
# p = e^(-(m/n) * (ln(2)^2));
fpr = 0.01
# Optimal size (number of elements in the bit array)
# m = -((n*ln(p))/(ln(2)^2));
m = (n * Math.log(fpr).abs) / (Math.log(2) ** 2)
# Optimal number of hash functions
# k = (m/n) * ln(2);
k = (m / n) * Math.log(2)
puts
puts
Links
- crate on crates.io
- doc on docs.rs
- source on gitlab.com
- MenhirKV, project using this filter
- More about Bloom filters:
- https://en.wikipedia.org/wiki/Bloom_filter
- https://www.vldb.org/archives/workshop/2010/proceedings/files/vldb_2010_workshop/ADMS_2010/adms10-canim.pdf
- https://arxiv.org/pdf/2001.03147.pdf
- https://arxiv.org/pdf/2106.04364.pdf
- http://www.vldb.org/pvldb/vol13/p2355-liu.pdf
- https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf
- https://freecontent.manning.com/all-about-bloom-filters/
License
OFilter is licensed under the MIT license.