quickbloom
quickbloom is an industry-grade Bloom filter library for Rust featuring:
- π
ahashβ fastest non-cryptographic hasher for short keys - π§ Enhanced double hashing β
h(i) = h1 + iΒ·h2 + iΒ²reduces bit clustering at high fill ratios - ποΈ Blocked (cache-friendly) layout β all
khashes for one item stay inside a single 64-byte CPU cache line - βοΈ Lock-free
AtomicBloomFilterβ insert and query from any number of threads simultaneously with zero locking - π
ScalableBloomFilterβ grows automatically when saturated; false-positive rate never deteriorates - πΎ Automatic persistence β attach a file path and the filter saves itself on
Drop
Installation
[]
= "0.2.0"
Quick Start
use ;
// Mathematically optimal size for 1M items at 1% false-positive rate
let config = new;
let = config.parameters;
let mut filter = with_mode;
filter.insert;
assert!; // true β definitely present
assert!; // false β definitely absent
Types at a Glance
| Type | Best for | Concurrency | Grows? | Persistence |
|---|---|---|---|---|
BloomFilter |
Single-threaded, general use | No | No | β |
ScalableBloomFilter |
Unbounded / unknown data sets | No (wrap in ConcurrentBloomFilter) |
β | β |
ConcurrentBloomFilter |
Read-heavy multi-thread workloads | RwLock |
Depends on F |
Depends on F |
AtomicBloomFilter |
Write-heavy lock-free hot paths (BETA) | Atomic | No | Manual |
BloomFilter
The standard single-threaded Bloom filter. Supports two memory layouts:
Standard layout (default)
Hash positions are spread uniformly across the entire bit array. Simple and accurate.
use BloomFilter;
let mut f = new;
f.insert;
assert!;
Blocked layout (cache-friendly)
All k hashes for one item are confined to a single 64-byte block (CPU cache line). On large filters this often doubles throughput with a negligible increase in false-positive rate.
use ;
let mut f = with_mode;
f.insert;
Persistence
Attach a path to enable automatic saving when the filter is dropped:
use BloomFilter;
// β auto-saved here
// Load it back next time:
let f = load_or_new;
assert!;
You can also call .save() manually to flush at any point:
f.save.expect;
Sizing with BloomConfig
Instead of guessing raw bit counts, use BloomConfig to compute optimal parameters:
use BloomConfig;
let config = new; // 500K items, 0.5% FP rate
let = config.parameters;
println!;
ScalableBloomFilter
Automatically provisions new layers when the current one exceeds 50% fill, preserving the configured false-positive rate as items grow without bound.
use ;
let config = new;
let mut f = new;
for i in 0..5_000u64
println!; // > 1 once the first layer fills
assert!;
assert!;
Custom growth parameters
use ;
let config = new;
let f = with_parameters;
Persistence
use ;
let config = new;
// β auto-saved
let f = load_or_new;
assert!;
ConcurrentBloomFilter
A generic Arc<RwLock<F>> wrapper. Use it when you need concurrent access to a filter that also supports persistence or dynamic scaling.
use ;
use Arc;
let f = new;
// Share across threads
let f2 = f.clone;
spawn.join.unwrap;
assert!;
Tip: For write-heavy workloads at high concurrency, prefer
AtomicBloomFilterwhich has zero lock overhead.
AtomicBloomFilter (BETA)
[!WARNING]
AtomicBloomFilteris currently in Beta. While functionally complete and high-performance, the API and internal representation may evolve. Use with caution in critical production paths.
A fully lock-free Bloom filter backed by AtomicU8 bytes. Any number of threads can insert and contains simultaneously without blocking.
use ;
use Arc;
let config = new;
let filter = new;
let handles: =
.map
.collect;
for h in handles
assert!;
assert!;
Monitoring saturation
println!;
// When fill_ratio > 0.5, false-positive rate rises quickly.
// At that point, create a new AtomicBloomFilter with a larger size.
Benchmarking
Run the included benchmarks to compare the three approaches on your hardware:
The benchmark suite measures:
insert/standardvsinsert/blockedvsinsert/atomic_lock_freecontains/standardvscontains/blockedvscontains/atomic_lock_freeconcurrent/atomic_threadsat 1, 4, and 8 threads
Running Examples
Running Tests
Hashing Details
quickbloom uses ahash with seeded RandomState for the two base hashes, and enhanced double hashing for index derivation:
h(i) = h1 + i Γ h2 + iΒ²
The quadratic term iΒ² eliminates the bit-clustering that can appear in pure double hashing when fill ratios exceed ~30%, resulting in a lower real-world false-positive rate.
File Format
The binary persistence format is versioned (v3). Files saved by an older version of quickbloom will not load in v3 (a None is returned and a fresh filter is created). The format stores:
- A 2-byte header:
[version][type] - For each filter:
[mode][size][hashes][items][raw_byte_count][raw_bytesβ¦] - For scalable filters: config parameters, growth factors, then each layer as above.
Benchmarks
To ensure industry-grade performance, we benchmarked quickbloom v0.2.0 against the established fastbloom v0.9.0 crate.
Test Environment
- Hardware: Apple M1 Max
- Task: 1,000,000 operations on strings (
"user_12345") - Parameters: 1,000,000 bits, 7 hash functions
| Crate | Mode | Avg Latency (Insert) |
|---|---|---|
| quickbloom v0.2.0 | Blocked (Cache-Line) | 14.78 ns |
| quickbloom v0.2.0 | Standard | 16.17 ns |
fastbloom v0.9.0 |
Blocked (512-bit) | 18.56 ns |
fastbloom v0.9.0 |
Standard | 18.77 ns |
[!TIP] With the switch to
ahashand our optimized Blocked Layout,quickbloomnow provides top-tier throughput that is competitive with and often exceeds established high-performance implementations.
Authors
Rasesh Shetty
License
MIT β see LICENSE