🌸 Bloomz
Fast, flexible Bloom filter for Rust with pluggable hashers and parallel operations.
Features
- Fast: Optimized bit operations with efficient double hashing
- Flexible: Pluggable hash builders (SipHash, AHash, xxHash, etc.)
- Parallel: Batch operations with Rayon for multi-core performance
- Serializable: JSON and binary serialization with Serde
- Safe: No unsafe code, extensive testing
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
# Enable optional features
= { = "0.1", = ["serde", "rayon"] }
Basic Usage
use BloomFilter;
// Create a filter for ~1000 items with 1% false positive rate
let mut filter = new_for_capacity;
// Insert items
filter.insert;
filter.insert;
// Check membership
assert!;
assert!;
Parallel Operations (with rayon feature)
use BloomFilter;
use *;
use RandomState;
let rs = new;
let mut filter = with_hasher;
// Parallel batch insert
let items: = .collect;
filter.insert_batch;
// Parallel batch contains
let test_items: = .collect;
let results = filter.contains_batch;
// Check if all items are present
let all_present = filter.contains_all;
Serialization (with serde feature)
use BloomFilter;
let mut filter = new_for_capacity;
filter.insert;
// JSON serialization
let json = to_string?;
let restored: BloomFilter = from_str?;
// Binary serialization
let bytes = filter.to_bytes;
let restored = from_bytes.unwrap;
Custom Hash Builders
use BloomFilter;
use RandomState;
// Default SipHash (secure)
let filter1 = new;
// Custom RandomState
let rs = new;
let filter2 = with_hasher;
// Fast hashers (requires feature flags)
Performance
Bloomz uses several optimizations:
- Double Hashing: Generate k hash functions from just 2 base hashes
- Efficient Bit Operations: Word-aligned bit manipulation with
u64 - Parallel Processing: Multi-threaded batch operations with Rayon
- Zero-Copy Serialization: Direct bit vector serialization
Benchmarks
Run benchmarks to compare hashers and parallel vs sequential operations:
# Compare different hash builders
# Compare parallel vs sequential operations
API Reference
Core Types
BloomFilter<S>- Main bloom filter with hasher typeSBitSet- Underlying bit storage with optimized operations
Key Methods
Insertion
insert(&item)- Insert a single iteminsert_batch(items)- Parallel batch insert (rayon feature)
Membership
contains(&item)- Check if item is probably in setcontains_batch(items)- Parallel batch check (rayon feature)contains_all(items)- Check if all items are present (rayon feature)
Set Operations
union_inplace(&other)- Merge with another filterintersect_inplace(&other)- Keep only common elementsclear()- Remove all items
Serialization
to_bytes()/from_bytes()- Binary format- Serde support for JSON/other formats
Mathematical Functions
use math;
// Calculate optimal parameters
let m = optimal_m;
let k = optimal_k;
let filter = new;
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
serde |
JSON/binary serialization | serde, serde_json |
rayon |
Parallel batch operations | rayon |
fast-ahash |
AHash hasher support | ahash |
fast-xxh3 |
xxHash hasher support | xxhash-rust |
Examples
See src/main.rs for a complete web crawler URL filter demo:
# Basic demo
# With all features
Use Cases
- Web Crawlers: Avoid revisiting URLs
- Caching: Quick "not in cache" checks
- Databases: Reduce disk lookups
- Networking: Packet deduplication
- Analytics: Unique visitor tracking
Contributing
Contributions welcome! Please check:
- Run
cargo test --all-features - Run
cargo bench --all-features - Add tests for new features
- Update documentation
License
MIT License - see LICENSE file.
🌸 Bloomz: Where speed meets flexibility in Rust Bloom filters!