flashsieve
Storage-level pre-filtering for pattern matching.
flashsieve builds per-block byte histograms and 2-byte n-gram bloom filters,
then uses them to answer which blocks might contain matches. If a block cannot
contain a pattern, it is skipped entirely—saving CPU, I/O, and memory at scale.
Installation
[]
= "0.1"
Minimum supported Rust version: 1.85.
Quick Start
use ;
let mut block_a = vec!;
let mut block_b = vec!;
block_a.copy_from_slice;
block_b.copy_from_slice;
let patterns: = ;
let index = new
.block_size
.bloom_bits
.build_streaming?;
let byte_filter = from_patterns;
let ngram_filter = from_patterns;
let candidates = index.candidate_blocks;
assert_eq!;
assert_eq!;
# Ok::
What It Does
| Component | Purpose |
|---|---|
BlockIndexBuilder |
Constructs indexes from raw bytes or streaming blocks |
BlockIndex |
In-memory index with per-block histograms & bloom filters |
FileBloomIndex |
File-level bloom union for fast n-gram rejection before per-block scans |
MmapBlockIndex |
Zero-parse, zero-copy queries over serialized indexes |
IncrementalBuilder |
Append new blocks to existing indexes without rebuilding |
ByteFilter |
Reject blocks that don't contain all required bytes |
NgramFilter |
Reject blocks that don't contain all required 2-byte n-grams |
Bloom Filter Math
The theoretical false-positive rate for a Bloom filter with:
m= number of bitsn= number of distinct n-grams insertedk= number of hash functions (k = 3in this crate)
is approximately:
FPR ≈ (1 - e^(-kn/m))^k
For the optimal number of hash functions k = (m/n) × ln(2):
FPR ≈ 0.6185^(m/n)
For filters with ≥4096 bits, flashsieve allocates an exact 65,536-bit pair
table, giving zero false positives for all possible 2-byte n-gram queries.
Safety
This crate keeps unsafe usage narrowly scoped to hot-path bloom-filter word
loads where index validity is proven by construction. All serialized input is
validated (magic, version, checksum, bounds) before use—corrupt files return
typed errors rather than panicking.
License
Licensed under the MIT license (LICENSE or https://opensource.org/licenses/MIT).