ofilter 0.4.3 - Docs.rs

# OFilter

[OFilter](https://gitlab.com/liberecofr/ofilter) is a fast thread-safe Bloom filter implemented in [Rust](https://www.rust-lang.org/).

It implements:

- [a basic Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.Bloom.html)
- [a thread-safe Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.SyncBloom.html)
- [a streaming Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.Stream.html)
- [a thread-safe streaming Bloom filter](https://docs.rs/ofilter/latest/ofilter/struct.SyncStream.html)
- [Bloom filter parameters guessing](https://docs.rs/ofilter/latest/ofilter/struct.Params.html)

The basic Bloom filter is inspired from the existing and stable
[bloomfilter crate](https://github.com/jedisct1/rust-bloom-filter) but does not directly depend on it.
The API is slightly different, and makes a few opinionated changes.

- it uses the [bitlen crate](https://crates.io/crates/bitlen) internally and exports its internal
  bit vector in that format.
- it uses the [siphasher crate](https://crates.io/crates/siphasher) as the internal
  hashing func. It is not possible to override this.
- it introduces that notion of "streaming filter" which can be useful when you do not work
  in batches and never know when items are going to stop coming in. As some point it is
  a naive implementation of an aging Bloom filter, though I prefer the term streaming.
- performance-wise, the default settings tend to optimize for less CPU usage but more
  memory footprint. This can be controlled in options but by default the filter can use
  more memory than strictly required.

In practice, I am using this filter to implement a self-expiring
[KV store](https://gitlab.com/liberecofr/menhirkv/).

Package has 2 optional features:

* `serde` to enable [Serde](https://serde.rs/) support for (de)serialization
* `rand` to enable [random](https://crates.io/crates/getrandom) seeds (enabled by default)

![OFilter icon](https://gitlab.com/liberecofr/ofilter/raw/main/ofilter.png)

# Status

While this is, to my knowledge, not used in "real" production, it is not
a very complex codebase and comes with a rather complete test suite.
Most of the bricks on which it builds are well-tested, widely used packages.
So it *should be OK* to use it. Again, *DISCLAIMER*, use at your own risks.

[![Build Status](https://gitlab.com/liberecofr/ofilter/badges/main/pipeline.svg)](https://gitlab.com/liberecofr/ofilter/pipelines)
[![Crates.io](https://img.shields.io/crates/v/ofilter.svg)](https://crates.io/crates/ofilter)
[![Gitlab](https://img.shields.io/gitlab/last-commit/liberecofr/ofilter)](https://gitlab.com/liberecofr/ofilter/tree/main)
[![License](https://img.shields.io/gitlab/license/liberecofr/ofilter)](https://gitlab.com/liberecofr/ofilter/blob/main/LICENSE)

# Usage

```rust
use ofilter::Bloom;

let mut filter: Bloom<usize> = Filter::new(100);
assert!(!filter.check(&42));
filter.set(&42);
assert!(filter.check(&42));
```

# Benchmarks

Taken from a random CI job:

```
running 7 tests
test tests::bench_extern_crate_bloom       ... bench:         287 ns/iter (+/- 37)
test tests::bench_extern_crate_bloomfilter ... bench:         232 ns/iter (+/- 7)
test tests::bench_ofilter_bloom            ... bench:          81 ns/iter (+/- 5)
test tests::bench_ofilter_stream           ... bench:         257 ns/iter (+/- 39)
test tests::bench_ofilter_sync_bloom       ... bench:         101 ns/iter (+/- 1)
test tests::bench_ofilter_sync_stream      ... bench:         280 ns/iter (+/- 14)
test tests::bench_standard_hashset         ... bench:         199 ns/iter (+/- 54)
test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured; 0 filtered out; finished in 16.47s
```

This is not the result of extensive, thorough benchmarking, just a random
snapshot at some point in development history.

TL;DR -> OFilter performs relatively well compared to others
such as [bloom](https://crates.io/crates/bloom) or [bloomfilter](https://crates.io/crates/bloomfilter).

The streaming version is slower but that is expected, as it uses two filters
under the hood, and performs extra checks to know when to swap buffers.

It is also interesting to note that using a standard HashSet is quite efficient
for small objects. The benchmark above uses isize entries. So if your set is composed
if small elements and is limited in absolute number, using
a simple set from the standard library may be good enough. Of course using
a Bloom filter has other advantates than raw CPU usage, most importantly it
ensures memory usage stays low and constant, which is a great advantage.
But keep in mind the problem you're trying to solve. Bench, measure, gather
numbers, use facts, not intuition.

To run the benchmarks:
```shell
cd bench
rustup default nightly
cargo bench
```

# Links

* [crate](https://crates.io/crates/ofilter) on crates.io
* [doc](https://docs.rs/ofilter/) on docs.rs
* [source](https://gitlab.com/liberecofr/ofilter/tree/main) on gitlab.com
* [MenhirKV](https://gitlab.com/liberecofr/menhirkv), project using this package
* [rust-bloom-filter](https://github.com/jedisct1/rust-bloom-filter), inspired this package

# License

OFilter is licensed under the [MIT](https://gitlab.com/liberecofr/ofilter/blob/main/LICENSE) license.