1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
//! Storage-level pre-filtering for pattern matching.
//!
//! `flashsieve` builds per-block byte histograms and 2-byte n-gram bloom
//! filters, then uses them to answer which blocks might contain matches.
//!
//! # Overview
//!
//! This crate provides:
//!
//! - [`BlockIndexBuilder`] — construct indexes
//! - [`BlockIndex`] — query and serialize indexes
//! - [`FileBloomIndex`] — file-level bloom union for fast n-gram rejection before per-block scans
//! - [`IncrementalBuilder`] — append new blocks to serialized indexes without re-indexing prior data
//! - [`MmapBlockIndex`] — query serialized indexes in place
//! - [`ByteFilter`] — byte-level pre-filtering
//! - [`NgramFilter`] — n-gram bloom pre-filtering
//! - [`NgramBloom`] and [`BlockedNgramBloom`] — standalone bloom filters
//!
//! # Quick Start
//!
//! ```
//! use flashsieve::{BlockIndexBuilder, ByteFilter, NgramFilter};
//!
//! let mut block_a = vec![b'x'; 256];
//! let mut block_b = vec![b'y'; 256];
//! block_a[..6].copy_from_slice(b"secret");
//! block_b[..5].copy_from_slice(b"token");
//! let patterns: [&[u8]; 2] = [b"secret", b"token"];
//!
//! let index = BlockIndexBuilder::new()
//! .block_size(256)
//! .bloom_bits(512)
//! .build_streaming([block_a, block_b].into_iter())?;
//!
//! let byte_filter = ByteFilter::from_patterns(&patterns);
//! let ngram_filter = NgramFilter::from_patterns(&patterns);
//!
//! let candidates = index.candidate_blocks(&byte_filter, &ngram_filter);
//! assert_eq!(candidates.len(), 1);
//! assert_eq!(candidates[0].length, 512);
//! # Ok::<(), flashsieve::Error>(())
//! ```
//!
//! # False Positive Rate
//!
//! The bloom filter's false positive rate depends on:
//! - `m` = number of bits
//! - `n` = number of distinct n-grams inserted
//! - `k` = number of hash functions (`k=3` in this crate)
//!
//! The theoretical FPR is:
//!
//! ```text
//! FPR ≈ (1 - e^(-kn/m))^k
//! ```
//!
//! For optimal `k = (m/n) × ln(2)`:
//!
//! ```text
//! FPR ≈ 0.6185^(m/n)
//! ```
//!
//! # Serialization Format
//!
//! Indexes can be serialized to a portable binary format (version 2):
//!
//! ```text
//! [magic: 4 bytes = "FSBX"]
//! [version: u32 LE = 2]
//! [block_size: u64 LE]
//! [total_len: u64 LE]
//! [block_count: u64 LE]
//! for each block:
//! [histogram: 256 × u32 LE = 1024 bytes]
//! [bloom_num_bits: u64 LE]
//! [bloom_words: u64 LE]
//! [bloom_data: bloom_words × u64 LE]
//! [crc32: u32 LE]
//! ```
//!
//! See the [`index`] module for detailed specification.
//!
//! # Hash Functions
//!
//! The bloom filter uses a primary 64-bit mix for each 2-byte n-gram and a
//! derived second hash for double hashing (`k = 3` probes):
//!
//! 1. **wyhash-style mix**: `x = (u64(a) << 8) | u64(b)`, then
//! `x = x.wrapping_mul(0x9E37_79B9_7F4A_7C15)`, then `x ^ (x >> 32)`.
//!
//! 2. **Second hash**: `h2 = (h1 ^ (h1 >> 32)).max(1)` so double hashing never
//! uses a zero increment.
//!
//! # Safety
//!
//! This crate keeps `unsafe` usage narrowly scoped to hot-path bloom-filter
//! word loads where index validity is proven by construction.
/// Bloom filter primitives for block-level 2-byte n-gram summaries.
///
/// Provides [`NgramBloom`](crate::bloom::NgramBloom), a space-efficient
/// probabilistic data structure for testing membership of 2-byte sequences.
/// Builders for constructing [`BlockIndex`](crate::BlockIndex) values from raw block data.
/// Error types returned by crate operations.
/// File-level bloom wrapper over a [`BlockIndex`](crate::BlockIndex).
/// Query filters used to pre-filter candidate blocks.
/// Byte histogram summaries stored for each indexed block.
/// Incremental update operations for existing block indexes.
/// Incremental index updates via filesystem notifications.
/// Indexed block metadata and candidate range queries.
/// Zero-parse views over serialized block-index data.
/// Compressed transport format for peer-to-peer bloom index sharing.
/// Per-block bloom filters for 2-byte n-gram membership checks.
pub use ;
/// Configurable builder for constructing [`BlockIndex`] instances.
pub use BlockIndexBuilder;
/// Crate error and result types.
pub use ;
/// Hierarchical bloom wrapper: file-level union bloom for fast n-gram rejection.
pub use FileBloomIndex;
/// Byte-level and n-gram-level query filters.
pub use ;
/// Composite filters and logical operators for multi-pattern composition.
pub use ;
/// Per-block byte-frequency histogram.
pub use ByteHistogram;
/// Append blocks to serialized indexes without rebuilding from raw file data.
pub use IncrementalBuilder;
/// Indexed block summary and byte-range query results.
pub use ;
/// Zero-parse block-index view and per-block serialized summary references.
pub use ;