# bloom-lib v0.5.0 — Implementation
**The structure set is complete.** v0.5.0 lands the five structures that joined
the flagship Bloom filter: a Cuckoo filter with deletion, a Count-Min Sketch, a
HyperLogLog estimator, a MinHash sketch, and a Top-K tracker. Each is fully
implemented, documented with runnable examples, exercised by unit and
property-based tests, and benchmarked. A Criterion harness and six end-to-end
examples round out the release. Hardening and the 1.0 audit follow.
## What is bloom-lib?
A collection of probabilistic data structures for Rust — compact summaries that
answer membership, cardinality, frequency, and similarity questions with
bounded, tunable error in a fraction of the memory an exact structure would
need. Allocation-free insertions, serializable state, mergeable structures, and
a pluggable hash function. `no_std`-friendly with no required runtime
dependency.
## What's new in 0.5.0
### `CuckooFilter` — membership with deletion
The companion to the Bloom filter for sets that change. It stores a 16-bit
fingerprint of each item in one of two candidate buckets (four slots each),
relocating residents on insertion — the "cuckoo" behaviour — and supports
`remove`. A single held-out **victim slot** preserves correctness at the load
limit, so no previously-inserted item is ever lost; when the table is genuinely
full, `insert` returns `Error::CapacityExceeded` instead of corrupting state.
```rust
use bloom_lib::CuckooFilter;
let mut filter = CuckooFilter::new(10_000).unwrap();
filter.insert("alice").unwrap();
assert!(filter.remove("alice"));
assert!(!filter.contains("alice"));
```
### `CountMinSketch` — frequency estimation
A grid of saturating counters that estimates how often each item has been seen.
The estimate is the minimum across rows, so it **never undercounts** and
overcounts only by the configured error bound. Width and depth are derived from
an `(epsilon, delta)` pair, and two sketches with matching geometry merge by
summing counters.
```rust
use bloom_lib::CountMinSketch;
let mut sketch = CountMinSketch::new(0.001, 0.001).unwrap();
sketch.add("page-view", 250);
assert!(sketch.estimate("page-view") >= 250);
```
### `HyperLogLog` — cardinality estimation
Counts distinct items in fixed memory — `2^precision` one-byte registers — with
a standard error of about `1.04 / sqrt(2^precision)`. It uses the bias-corrected
estimator with a linear-counting fallback at low cardinalities, so it is accurate
across the whole range. Estimators of equal precision merge by taking the
register-wise maximum, giving the cardinality of the union.
```rust
use bloom_lib::HyperLogLog;
let mut hll = HyperLogLog::new(14).unwrap();
for i in 0..100_000u32 { hll.insert(&i); }
assert!((97_000..=103_000).contains(&hll.count()));
```
### `MinHash` — set similarity
Estimates the Jaccard similarity of two sets by comparing fixed-length
signatures, in time and space proportional to the signature length rather than
the set size. Merging two sketches (slot-wise minimum) yields the sketch of the
union.
```rust
use bloom_lib::MinHash;
let mut a = MinHash::new(256).unwrap();
let mut b = MinHash::new(256).unwrap();
for w in ["the", "quick", "brown", "fox"] { a.insert(w); }
for w in ["the", "quick", "red", "fox"] { b.insert(w); }
assert!((0.45..=0.75).contains(&a.similarity(&b).unwrap())); // true J = 0.6
```
### `TopK` — heavy hitters
Tracks the `k` most frequent items over a Count-Min Sketch plus a small
monitored list, in memory bounded by the sketch size and `k`. It is the one
structure that stores its keys (`T: Eq + Hash`), moving a key into the list only
when it is promoted.
```rust
use bloom_lib::TopK;
let mut top = TopK::new(3, 0.001, 0.001).unwrap();
for _ in 0..100 { top.insert("apple"); }
for _ in 0..50 { top.insert("banana"); }
for _ in 0..10 { top.insert("cherry"); }
assert_eq!(top.top()[0].0, &"apple");
```
### Property tests, benchmarks, and examples
- **`tests/proptests.rs`** asserts the invariants that must hold for every input:
Bloom and Cuckoo filters never report a false negative, the Count-Min Sketch
never undercounts, HyperLogLog stays within a 5% envelope, and MinHash
similarity is bounded and symmetric.
- **`benches/probabilistic.rs`** benchmarks every hot path with Criterion.
- **Six examples**, one per structure: `bloom_dedup`, `membership_with_deletion`,
`frequency`, `cardinality`, `similarity`, and `top_words`.
## Performance
Latest local Criterion means (Windows x86_64, Rust stable, steady-state):
| `BloomFilter::insert` | ~6.9 |
| `BloomFilter::contains` (hit) | ~5.5 |
| `CuckooFilter::contains` (hit) | ~6.0 |
| `CountMinSketch::increment` | ~5.8 |
| `CountMinSketch::estimate` | ~6.3 |
| `HyperLogLog::insert` | ~0.9 |
| `MinHash::insert` (128 hashes) | ~24 |
| `TopK::insert` (k = 100) | ~72 |
## Breaking changes
None. v0.5.0 is purely additive over v0.2.0.
## Verification
Run on Windows x86_64 (Rust stable 1.95 and MSRV 1.75) and on Linux via WSL2
Ubuntu; the same commands run in CI across Linux, macOS, and Windows on stable
and 1.75:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo clippy --no-default-features --all-targets -- -D warnings
cargo test --all-features
cargo test --no-default-features --features alloc
cargo bench --bench probabilistic
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
```
All green. Counts at this tag (`--all-features`): 54 unit + 8 property + 4
integration + 60 doctests.
The committed `Cargo.lock` pins the `proptest` and `criterion` dev-dependency
trees to versions that still build on the MSRV; newer releases require a newer
compiler and are dev-only.
## What's next
- **v0.9.0 — Hardening + audit.** Feature freeze, the full pre-1.0 audit
(cleanliness, error hardening, API-stability review, documentation,
cross-platform CI), and saved benchmark baselines.
## Installation
```toml
[dependencies]
bloom-lib = "0.5"
# With serialization:
bloom-lib = { version = "0.5", features = ["serde"] }
```
MSRV: Rust 1.75.
## Documentation
- [README](https://github.com/jamesgober/bloom-lib/blob/main/README.md)
- [API Reference](https://github.com/jamesgober/bloom-lib/blob/main/docs/API.md)
- [CHANGELOG](https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md)
---
**Full diff:** [`v0.2.0...v0.5.0`](https://github.com/jamesgober/bloom-lib/compare/v0.2.0...v0.5.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md#050---2026-05-28).