bloom-lib 1.0.0

Probabilistic data structure library: Bloom filters, Cuckoo filters, Count-Min Sketch, HyperLogLog, MinHash, and Top-K. Tunable false-positive rates, serializable state, merge support, and streaming-safe updates.
Documentation
# bloom-lib v0.2.0 — Foundation

**The first working release.** v0.2.0 establishes the public API surface that
1.0 will preserve: the crate-wide error type, the deterministic hashing layer,
and the flagship `BloomFilter` — fully implemented, documented, and tested
rather than stubbed. It also adds the `alloc` and `serde` feature flags and a
complete `docs/API.md`. The remaining structures (Cuckoo, Count-Min Sketch,
HyperLogLog, MinHash, Top-K) follow in v0.5.0.

## What is bloom-lib?

A collection of probabilistic data structures for Rust — compact summaries that
answer membership, cardinality, frequency, and similarity questions with
bounded, tunable error in a fraction of the memory an exact structure would
need. Built for streaming workloads: allocation-free insertions, serializable
state, mergeable structures, and a pluggable hash function. `no_std`-friendly
and free of any required runtime dependency.

## What's new in 0.2.0

### `BloomFilter<T, S>` — probabilistic set membership

The headline structure. A Bloom filter answers "have I seen this item?" using a
fraction of the memory a real set would need, trading exactness for one-sided
error: `contains` never reports a false negative, but may report a false
positive at a rate you choose at construction time.

```rust
use bloom_lib::BloomFilter;

// Size for 100,000 distinct items at a 0.1% false-positive rate.
let mut filter = BloomFilter::new(100_000, 0.001).unwrap();

filter.insert("session-token");
assert!(filter.contains("session-token"));
assert!(!filter.contains("never-seen"));
```

The geometry (`m` bits, `k` hashes) is derived from the standard formulas
`m = -n·ln(p)/(ln2)²` and `k = (m/n)·ln2`, or you can specify it directly with
`with_dimensions`. `insert` returns a novelty flag — `true` the first time an
item is seen — which turns it into a one-line stream deduplicator. The filter
also exposes `merge` (bitwise union of two same-geometry filters), `clear`,
`count_ones`, `estimated_len`, and `estimated_false_positive_rate`.

### Deterministic, pluggable hashing

Every structure is generic over `core::hash::BuildHasher` and defaults to the
new `DefaultHashBuilder`. The default hasher, `DefaultHasher`, is a fast
non-cryptographic 64-bit hash built on a 64×64-bit multiply-fold with strong
avalanche behaviour. It is seeded with a fixed constant on purpose: deterministic
hashing is what makes filters mergeable and serialized state portable across
machines. When inputs are adversarial, swap in a randomly-seeded hasher such as
`std::collections::hash_map::RandomState`:

```rust
use std::collections::hash_map::RandomState;
use bloom_lib::BloomFilter;

let filter: BloomFilter<&str, RandomState> =
    BloomFilter::with_hasher(1_000, 0.01, RandomState::new()).unwrap();
```

Internally the structures synthesise the `k` index hashes they need from a
single hashing pass using the Kirsch–Mitzenmacher double-hashing scheme, then map
each hash into range with a division-free multiply-shift reduction.

### `Error` — one allocation-free error type

A single `#[non_exhaustive]` enum covers every fallible path:
`InvalidParameter { param, reason }` for rejected tuning values,
`IncompatibleParameters` for mismatched merges, and `CapacityExceeded` for the
Cuckoo filter that lands later. It carries only `&'static str` data, so it works
on `no_std` without an allocator, and implements `Display` plus (under `std`)
`std::error::Error`.

### Feature flags: `alloc` and `serde`

- `alloc` enables the structures on heap-capable `no_std` targets. `std` now
  implies `alloc` and adds the `std::error::Error` impl.
- `serde` derives `Serialize`/`Deserialize` for every structure. The hasher is
  skipped and rebuilt with `Default` on deserialization, so the serialized form
  is just the bit array and geometry. With the deterministic default hasher, a
  filter serialized on one machine queries identically on another.

With no features the crate still compiles and exposes `VERSION` and `Error`.

### Documentation

A complete `docs/API.md` documents every public item with parameters and runnable
examples. The crate-level rustdoc, every public method, and a new `bloom_dedup`
example round out the coverage.

## Breaking changes

None relative to the v0.1.0 scaffold (which had no public API beyond `VERSION`).

## Verification

Run on Windows x86_64 (Rust stable 1.95 and MSRV 1.75) and on Linux via WSL2
Ubuntu; the same commands run in CI across Linux, macOS, and Windows on stable
and 1.75:

```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo clippy --no-default-features --all-targets -- -D warnings
cargo clippy --no-default-features --features alloc --all-targets -- -D warnings
cargo test --all-features
cargo test --no-default-features
cargo run --example bloom_dedup --release
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
```

All green. Counts at this tag:

- `--all-features`: 12 unit + 4 integration + 19 doctests.
- `--no-default-features`: 1 integration + 6 doctests.

## What's next

- **v0.5.0 — Implementation.** Cuckoo filter (with deletion), Count-Min Sketch,
  HyperLogLog, MinHash, and Top-K; property tests, a Criterion benchmark
  harness, and published performance numbers.

## Installation

```toml
[dependencies]
bloom-lib = "0.2"

# With serialization:
bloom-lib = { version = "0.2", features = ["serde"] }
```

MSRV: Rust 1.75.

## Documentation

- [README]https://github.com/jamesgober/bloom-lib/blob/main/README.md
- [API Reference]https://github.com/jamesgober/bloom-lib/blob/main/docs/API.md
- [CHANGELOG]https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md

---

**Full diff:** [`v0.1.0...v0.2.0`](https://github.com/jamesgober/bloom-lib/compare/v0.1.0...v0.2.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/bloom-lib/blob/main/CHANGELOG.md#020---2026-05-28).