bloom-lib 1.0.0

Probabilistic data structure library: Bloom filters, Cuckoo filters, Count-Min Sketch, HyperLogLog, MinHash, and Top-K. Tunable false-positive rates, serializable state, merge support, and streaming-safe updates.
Documentation
# bloom-lib v0.5.0 — Implementation

**The structure set is complete.** v0.5.0 lands the five structures that joined
the flagship Bloom filter: a Cuckoo filter with deletion, a Count-Min Sketch, a
HyperLogLog estimator, a MinHash sketch, and a Top-K tracker. Each is fully
implemented, documented with runnable examples, exercised by unit and
property-based tests, and benchmarked. A Criterion harness and six end-to-end
examples round out the release. Hardening and the 1.0 audit follow.

## What is bloom-lib?

A collection of probabilistic data structures for Rust — compact summaries that
answer membership, cardinality, frequency, and similarity questions with
bounded, tunable error in a fraction of the memory an exact structure would
need. Allocation-free insertions, serializable state, mergeable structures, and
a pluggable hash function. `no_std`-friendly with no required runtime
dependency.

## What's new in 0.5.0

### `CuckooFilter` — membership with deletion

The companion to the Bloom filter for sets that change. It stores a 16-bit
fingerprint of each item in one of two candidate buckets (four slots each),
relocating residents on insertion — the "cuckoo" behaviour — and supports
`remove`. A single held-out **victim slot** preserves correctness at the load
limit, so no previously-inserted item is ever lost; when the table is genuinely
full, `insert` returns `Error::CapacityExceeded` instead of corrupting state.

```rust
use bloom_lib::CuckooFilter;

let mut filter = CuckooFilter::new(10_000).unwrap();
filter.insert("alice").unwrap();
assert!(filter.remove("alice"));
assert!(!filter.contains("alice"));
```

### `CountMinSketch` — frequency estimation

A grid of saturating counters that estimates how often each item has been seen.
The estimate is the minimum across rows, so it **never undercounts** and
overcounts only by the configured error bound. Width and depth are derived from
an `(epsilon, delta)` pair, and two sketches with matching geometry merge by
summing counters.

```rust
use bloom_lib::CountMinSketch;

let mut sketch = CountMinSketch::new(0.001, 0.001).unwrap();
sketch.add("page-view", 250);
assert!(sketch.estimate("page-view") >= 250);
```

### `HyperLogLog` — cardinality estimation

Counts distinct items in fixed memory — `2^precision` one-byte registers — with
a standard error of about `1.04 / sqrt(2^precision)`. It uses the bias-corrected
estimator with a linear-counting fallback at low cardinalities, so it is accurate
across the whole range. Estimators of equal precision merge by taking the
register-wise maximum, giving the cardinality of the union.

```rust
use bloom_lib::HyperLogLog;

let mut hll = HyperLogLog::new(14).unwrap();
for i in 0..100_000u32 { hll.insert(&i); }
assert!((97_000..=103_000).contains(&hll.count()));
```

### `MinHash` — set similarity

Estimates the Jaccard similarity of two sets by comparing fixed-length
signatures, in time and space proportional to the signature length rather than
the set size. Merging two sketches (slot-wise minimum) yields the sketch of the
union.

```rust
use bloom_lib::MinHash;

let mut a = MinHash::new(256).unwrap();
let mut b = MinHash::new(256).unwrap();
for w in ["the", "quick", "brown", "fox"] { a.insert(w); }
for w in ["the", "quick", "red", "fox"] { b.insert(w); }
assert!((0.45..=0.75).contains(&a.similarity(&b).unwrap())); // true J = 0.6
```

### `TopK` — heavy hitters

Tracks the `k` most frequent items over a Count-Min Sketch plus a small
monitored list, in memory bounded by the sketch size and `k`. It is the one
structure that stores its keys (`T: Eq + Hash`), moving a key into the list only
when it is promoted.

```rust
use bloom_lib::TopK;

let mut top = TopK::new(3, 0.001, 0.001).unwrap();
for _ in 0..100 { top.insert("apple"); }
for _ in 0..50  { top.insert("banana"); }
for _ in 0..10  { top.insert("cherry"); }
assert_eq!(top.top()[0].0, &"apple");
```

### Property tests, benchmarks, and examples

- **`tests/proptests.rs`** asserts the invariants that must hold for every input:
  Bloom and Cuckoo filters never report a false negative, the Count-Min Sketch
  never undercounts, HyperLogLog stays within a 5% envelope, and MinHash
  similarity is bounded and symmetric.
- **`benches/probabilistic.rs`** benchmarks every hot path with Criterion.
- **Six examples**, one per structure: `bloom_dedup`, `membership_with_deletion`,
  `frequency`, `cardinality`, `similarity`, and `top_words`.

## Performance

Latest local Criterion means (Windows x86_64, Rust stable, steady-state):

| Operation | ns/op |
|-----------|------:|
| `BloomFilter::insert` | ~6.9 |
| `BloomFilter::contains` (hit) | ~5.5 |
| `CuckooFilter::contains` (hit) | ~6.0 |
| `CountMinSketch::increment` | ~5.8 |
| `CountMinSketch::estimate` | ~6.3 |
| `HyperLogLog::insert` | ~0.9 |
| `MinHash::insert` (128 hashes) | ~24 |
| `TopK::insert` (k = 100) | ~72 |

## Breaking changes

None. v0.5.0 is purely additive over v0.2.0.

## Verification

Run on Windows x86_64 (Rust stable 1.95 and MSRV 1.75) and on Linux via WSL2
Ubuntu; the same commands run in CI across Linux, macOS, and Windows on stable
and 1.75:

```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo clippy --no-default-features --all-targets -- -D warnings
cargo test --all-features
cargo test --no-default-features --features alloc
cargo bench --bench probabilistic
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
```

All green. Counts at this tag (`--all-features`): 54 unit + 8 property + 4
integration + 60 doctests.

The committed `Cargo.lock` pins the `proptest` and `criterion` dev-dependency
trees to versions that still build on the MSRV; newer releases require a newer
compiler and are dev-only.

## What's next

- **v0.9.0 — Hardening + audit.** Feature freeze, the full pre-1.0 audit
  (cleanliness, error hardening, API-stability review, documentation,
  cross-platform CI), and saved benchmark baselines.

## Installation

```toml
[dependencies]
bloom-lib = "0.5"

# With serialization:
bloom-lib = { version = "0.5", features = ["serde"] }
```

MSRV: Rust 1.75.

## Documentation

- [README]https://github.com/jamesgober/bloom-lib/blob/main/README.md
- [API Reference]https://github.com/jamesgober/bloom-lib/blob/main/docs/API.md
- [CHANGELOG]https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md

---

**Full diff:** [`v0.2.0...v0.5.0`](https://github.com/jamesgober/bloom-lib/compare/v0.2.0...v0.5.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md#050---2026-05-28).