# OFilter
[OFilter](https://gitlab.com/liberecofr/ofilter) is an opinionated Bloom filter implemented in [Rust](https://www.rust-lang.org/).
It implements:
- [a basic Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.Bloom.html)
- [a thread-safe Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.Stream.html)
- [a streaming Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.Stream.html)
- [a thread-safe streaming Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.SyncStream.html)
The basic Bloom filter is inspired from the existing and stable
[bloomfilter crate](https://github.com/jedisct1/rust-bloom-filter) but does not directly depend on it.
The API is slightly different, and makes a few opinionated changes.
- it has a fancy [parameters support](https://docs.rs/ofilter/latest/ofilter/struct.Params.html) so
that you can define a filter by its number of items, false positive ratio, size in bits, number
of hash, or any combination of them. It figures out the magic.
- it uses the [bitlen crate](https://crates.io/crates/bitlen) internally and exports its internal
bit vector in that format.
- it uses the [siphasher crate](https://crates.io/crates/siphasher) as the internal
hashing func. It is not possible to override this.
- it introduces that notion of "streaming filter" which can be useful when you do not work
in batches and never know when items are going to stop coming in. As some point it is
a naive implementation of an aging Bloom filter, though I prefer the term streaming.
- performance-wise, the default settings tend to optimize for less CPU usage but more
memory footprint. This can be controlled in options but by default the filter can
more memory than strictly required.
In practice, I am using this filter to implement a self-expiring
[KV store](https://gitlab.com/liberecofr/menhirkv/).

# Status
For now this is a toy project, clearly *NOT* suitable for production use.
[](https://gitlab.com/liberecofr/ofilter/pipelines)
[](https://crates.io/crates/ofilter)
[](https://gitlab.com/liberecofr/ofilter/tree/main)
[](https://gitlab.com/liberecofr/ofilter/blob/main/LICENSE)
https://en.wikipedia.org/wiki/Bloom_filter
# Usage
```rust
use ofilter::Bloom;
let mut filter: Bloom<usize> = Filter::new(100);
assert!(!filter.check(&42));
filter.set(&42);
assert!(filter.check(&42));
```
# A bit of theory
This [interactive Bloom filter params calulator](https://hur.st/bloomfilter/?n=300000&p=0.01&m=&k=)
proved very useful while developping this.
Also it recaps the usual formulas:
```
With:
n: number of items in the filter
p: probability of false positives, fraction between 0 and 1
m: number of bits in the filter
k: number of hash functions
We have:
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))));
k = round((m / n) * log(2));
```
A command line equivalent would be this practical
[bloom-filter-calculator.rb gist](https://gist.github.com/brandt/8f9ab3ceae37562a2841)
```ruby
# Optimal bloom filter size and number of hashes
# Tips:
# 1. One byte per item in the input set gives about a 2% false positive rate.
# 2. The optimal number of hash functions is ~0.7x the number of bits per item.
# 3. The number of hashes dominates performance.
# Expected number of items in the collection
# n = (m * ln(2))/k;
n = 300_000
# Acceptable false-positive rate (0.01 = 1%)
# p = e^(-(m/n) * (ln(2)^2));
fpr = 0.01
# Optimal size (number of elements in the bit array)
# m = -((n*ln(p))/(ln(2)^2));
m = (n * Math.log(fpr).abs) / (Math.log(2) ** 2)
# Optimal number of hash functions
# k = (m/n) * ln(2);
k = (m / n) * Math.log(2)
puts "Optimal bloom filter size: #{m.ceil} bits"
puts "Optimal number of hash functions: #{k.ceil}"
```
# Links
* [crate](https://crates.io/crates/ofilter) on crates.io
* [doc](https://docs.rs/ofilter/) on docs.rs
* [source](https://gitlab.com/liberecofr/ofilter/tree/main) on gitlab.com
* [MenhirKV](https://gitlab.com/liberecofr/menhirkv), project using this filter
* More about Bloom filters:
* https://en.wikipedia.org/wiki/Bloom_filter
* https://www.vldb.org/archives/workshop/2010/proceedings/files/vldb_2010_workshop/ADMS_2010/adms10-canim.pdf
* https://arxiv.org/pdf/2001.03147.pdf
* https://arxiv.org/pdf/2106.04364.pdf
* http://www.vldb.org/pvldb/vol13/p2355-liu.pdf
* https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf
* https://freecontent.manning.com/all-about-bloom-filters/
# License
OFilter is licensed under the [MIT](https://gitlab.com/liberecofr/ofilter/blob/main/LICENSE) license.