What it does
bloom-lib is a collection of probabilistic data structures: compact summaries
that answer questions about a data set — membership, cardinality, frequency,
similarity — with bounded, tunable error in a fraction of the memory an exact
structure would need. They are designed for streaming workloads: insertions are
allocation-free, state is serializable, and compatible structures can be merged.
The crate is no_std-friendly (it needs only alloc), carries no required
runtime dependency, and is generic over the hash function so you can trade the
fast deterministic default for an adversary-resistant one when inputs are
untrusted.
Available structures
| Structure | Answers | Deletion | Mergeable |
|---|---|---|---|
BloomFilter |
Is this item in the set? | — | ✅ |
CuckooFilter |
Is this item in the set? | ✅ | — |
CountMinSketch |
How often have I seen this item? | — | ✅ |
HyperLogLog |
How many distinct items are there? | — | ✅ |
MinHash |
How similar are two sets? | — | ✅ |
TopK |
What are the most frequent items? | — | — |
Every structure is generic over the item type and the hash function, supports
clear, and (with the serde feature) serializes its state.
Installation
[]
= "1.0"
# With serialization support:
= { = "1.0", = ["serde"] }
# On a heap-capable no_std target:
= { = "1.0", = false, = ["alloc"] }
Quick start
use BloomFilter;
// Size for 100,000 distinct items at a 0.1% false-positive rate.
let mut filter = new.unwrap;
filter.insert;
assert!; // never a false negative
assert!; // very likely absent
insert doubles as a deduplicating primitive: it returns true the first time
it sees an item and false when the item was probably already present.
use BloomFilter;
let mut seen = new.unwrap;
assert!; // newly added
assert!; // already present
API overview
For the complete reference with parameters and worked examples, see
docs/API.md.
BloomFilter— set membership with a tunable false-positive rate, merge support, and cardinality estimation.CuckooFilter— set membership that also supports deletion.CountMinSketch— per-item frequency estimation.HyperLogLog— distinct-count estimation in tiny, fixed memory.MinHash— Jaccard similarity estimation between sets.TopK— the most frequent items (heavy hitters) in a stream.Error— the crate-wide error type returned by fallible operations.- Hashing —
DefaultHasher,DefaultHashBuilder, and the rationale for deterministic hashing. - Prelude — convenient glob re-exports.
Hashing
Every structure is generic over core::hash::BuildHasher and defaults to the
deterministic DefaultHashBuilder. A fixed seed makes hashing reproducible:
filters built with the default hasher are always mergeable, and serialized state
round-trips byte-for-byte across runs and platforms. The trade-off is that a
fixed seed offers no defence against an adversary crafting colliding inputs. When
inputs are untrusted, supply a randomly-seeded hasher such as
std::collections::hash_map::RandomState:
use RandomState;
use BloomFilter;
let filter: =
with_hasher.unwrap;
The default hasher is a fast, non-cryptographic 64-bit hash with strong avalanche behaviour. From a single hashing pass the structures synthesise the many index positions they need using the Kirsch–Mitzenmacher double-hashing scheme, and map hashes into range with a division-free multiply-shift reduction.
Performance
The hot paths are allocation-free and use only integer arithmetic: one hashing
pass per item, then a handful of multiply-shift index computations and array
probes. Floating-point math runs only at construction, via
libm, so a structure's derived geometry is
bit-identical on every platform.
Latest local Criterion means (cargo bench --bench probabilistic, Windows
x86_64, Rust stable, steady-state on a pre-populated structure):
| Operation | ns/op |
|---|---|
BloomFilter::insert |
~6.9 |
BloomFilter::contains (hit) |
~5.5 |
CuckooFilter::contains (hit) |
~6.0 |
CountMinSketch::increment |
~5.8 |
CountMinSketch::estimate |
~6.3 |
HyperLogLog::insert |
~0.9 |
MinHash::insert (128 hashes) |
~24 |
TopK::insert (k = 100) |
~72 |
MinHash and TopK scale with their configured size (the signature length and
k, respectively); the others are effectively constant-time per operation. Run
cargo bench to reproduce these on your hardware.
Feature flags
| Feature | Default | Description |
|---|---|---|
std |
✅ | Enables every structure and the std::error::Error impl for Error. Implies alloc. |
alloc |
— | Enables every structure without std, for heap-capable no_std targets. |
serde |
— | Derives Serialize/Deserialize for every structure. Implies alloc. |
With no features enabled the crate still compiles and exposes VERSION and
Error, so it can sit in a no_std dependency tree without pulling in a heap.
Examples
Runnable examples live in examples/, one per structure:
| Example | Shows |
|---|---|
bloom_dedup |
Deduplicating a stream with a BloomFilter. |
membership_with_deletion |
Granting and revoking membership with a CuckooFilter. |
frequency |
Per-item frequency estimation with a CountMinSketch. |
cardinality |
Counting unique items with HyperLogLog. |
similarity |
Document similarity with MinHash. |
top_words |
Most frequent words with TopK. |
Testing
# Default features
# Every feature, including serde round-trips and property tests
# Heap-only no_std build
# Microbenchmarks
Cross-platform support
Tier-1 targets — Linux, macOS, and Windows (x86_64 and aarch64) — compile and
pass the full suite on both stable and the MSRV (Rust 1.75). CI runs formatting,
clippy with -D warnings, the test suite, and a documentation build with
RUSTDOCFLAGS=-D warnings across all three operating systems.
Standards
- REPS governs every decision. See REPS.md.
- MSRV: Rust 1.75.
- Edition: 2021.
- Cross-platform: Linux, macOS, Windows.
License
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.