pub enum Prefilter {
None,
SingleByte {
byte: u8,
max_offset: usize,
},
TwoBytes {
byte1: u8,
byte2: u8,
max_offset: usize,
},
ThreeBytes {
byte1: u8,
byte2: u8,
byte3: u8,
max_offset: usize,
},
MultiBytes {
bytes: Vec<u8>,
max_offset: usize,
},
Literal {
needle: Vec<u8>,
},
LiteralWithOffset {
needle: Vec<u8>,
max_offset: usize,
},
TwoByteLiteral {
byte1: u8,
byte2: u8,
max_offset: usize,
},
FuzzyLiteral {
first_byte: u8,
alt_bytes: Vec<u8>,
max_edits: usize,
},
PigeonholePieces {
pieces: Vec<Vec<u8>>,
offsets: Vec<usize>,
max_edits: usize,
},
}Expand description
A prefilter that can quickly find candidate start positions.
Variants§
None
No prefiltering - try every position.
SingleByte
Search for a single byte.
Fields
TwoBytes
Search for two possible bytes (for case-insensitive or fuzzy).
Fields
ThreeBytes
Search for three possible bytes.
Fields
MultiBytes
Search for 4+ possible bytes (uses linear scan).
Fields
Literal
Search for an exact literal substring.
LiteralWithOffset
Search for a literal prefix with offset (for fuzzy matching).
Fields
TwoByteLiteral
Fast search for exactly 2-byte literal (uses memchr + check). Much faster than memmem for 2-byte needles.
Fields
FuzzyLiteral
Search for a literal with fuzzy tolerance.
Fields
PigeonholePieces
Pigeonhole-based prefilter: for k edits, at least one of (k+1) pieces must match exactly. Much more selective than single-byte prefiltering for longer patterns.
Implementations§
Source§impl Prefilter
impl Prefilter
Sourcepub fn fuzzy(literal: &str, max_edits: u8) -> Self
pub fn fuzzy(literal: &str, max_edits: u8) -> Self
Create a prefilter for a fuzzy literal.
Strategy: Even with fuzzy matching, certain bytes from the pattern must appear in any valid match. We search for bytes from the first positions that could possibly match.
Sourcepub fn fuzzy_rare(
literal: &str,
max_edits: u8,
text_sample: Option<&[u8]>,
) -> Self
pub fn fuzzy_rare( literal: &str, max_edits: u8, text_sample: Option<&[u8]>, ) -> Self
Create a prefilter for a fuzzy literal with rarity-based byte selection.
For patterns with small alphabets (like DNA), select the rarest bytes to minimize false positives. Falls back to streaming if all bytes are common.
Sourcepub fn pigeonhole(literal: &str, max_edits: u8) -> Self
pub fn pigeonhole(literal: &str, max_edits: u8) -> Self
Create a pigeonhole-based prefilter for fuzzy matching.
For a pattern of length m with at most k edits, we split the pattern into (k+1) non-overlapping pieces. By the pigeonhole principle, at least one piece must match exactly in any valid fuzzy match.
This is much more selective than single-byte prefiltering for longer patterns.
For example, "hello" with k=1 → pieces ["hel", "lo"], both 2-3 chars long.
Finding "hel" or "lo" is much rarer than finding ‘h’ or ‘e’.
Sourcepub fn case_insensitive(literal: &str) -> Self
pub fn case_insensitive(literal: &str) -> Self
Create a prefilter for case-insensitive matching.
Sourcepub fn multi_fuzzy(literals: &[(&str, u8)], case_insensitive: bool) -> Self
pub fn multi_fuzzy(literals: &[(&str, u8)], case_insensitive: bool) -> Self
Create a prefilter for multiple fuzzy literals.
Collects first bytes from all patterns and uses memchr to find candidates.
Each entry is (literal, max_edits).
Sourcepub fn find_candidates<'a>(&'a self, text: &'a [u8]) -> CandidateIter<'a> ⓘ
pub fn find_candidates<'a>(&'a self, text: &'a [u8]) -> CandidateIter<'a> ⓘ
Find candidate positions in the text. Returns an iterator over byte positions where a match might start.
Sourcepub fn max_offset(&self) -> usize
pub fn max_offset(&self) -> usize
Get the max_offset for this prefilter (how far back a match could start).
Sourcepub fn selectivity(&self) -> usize
pub fn selectivity(&self) -> usize
Check if this prefilter is “selective” enough to be effective. A prefilter searching for too many different bytes will hit too many positions. Returns the number of unique bytes being searched for (lower is better).
Sourcepub fn is_selective(&self) -> bool
pub fn is_selective(&self) -> bool
Check if this prefilter is likely to be effective for multi-pattern search. Returns false if the prefilter searches for too many different bytes.