pub struct BitapMatcher { /* private fields */ }Expand description
Bitap matcher for fuzzy string matching.
Implementations§
Source§impl BitapMatcher
impl BitapMatcher
Sourcepub fn new(
pattern: &str,
limits: EditLimits,
case_insensitive: bool,
) -> Option<Self>
pub fn new( pattern: &str, limits: EditLimits, case_insensitive: bool, ) -> Option<Self>
Create a new Bitap matcher.
Returns None if the pattern is too long (> 64 chars).
Sourcepub fn pattern_chars(&self) -> &[char]
pub fn pattern_chars(&self) -> &[char]
Returns the pattern as a slice of characters.
Sourcepub fn get_skip(&self, byte: u8) -> usize
pub fn get_skip(&self, byte: u8) -> usize
Get Boyer-Moore skip distance for a byte. Returns 0 if byte is in pattern, otherwise returns skip distance.
Sourcepub fn find_next_candidate(&self, text: &[u8], start: usize) -> usize
pub fn find_next_candidate(&self, text: &[u8], start: usize) -> usize
Find the next position worth checking using Boyer-Moore skipping.
Scans from start looking for a byte that’s in the pattern.
Returns the position of the first pattern-relevant byte, or text.len() if none.
Sourcepub fn find_all(&self, text: &str, threshold: f32) -> Vec<DamLevMatch>
pub fn find_all(&self, text: &str, threshold: f32) -> Vec<DamLevMatch>
Find all matches in text using Bitap algorithm with k errors.
Sourcepub fn find_best_non_overlapping(
&self,
text: &str,
threshold: f32,
require_first_char: bool,
) -> Vec<DamLevMatch>
pub fn find_best_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>
Find all non-overlapping matches, preferring best (highest similarity) matches.
This method finds all overlapping candidates, sorts by similarity, then greedily selects non-overlapping matches starting from highest similarity. This ensures we prefer “Lorem” (sim=1.0) over “ore” (sim=0.6).
Matches must be at least pattern_len - max_edits characters long to be
considered valid. This prevents overly short fuzzy matches.
When require_first_char is true, matches must start with the same first
character as the pattern (case-insensitive). This filters out spurious
matches like “bore” when searching for “Lorem”.
Sourcepub fn find_all_non_overlapping(
&self,
text: &str,
threshold: f32,
require_first_char: bool,
) -> Vec<DamLevMatch>
pub fn find_all_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>
Fast find of non-overlapping matches optimized for iteration (greedy leftmost).
This is faster than find_all() followed by filtering because:
- It skips ahead after each match (no overlapping work)
- It only verifies the most likely start position per match
- For exact matches (d=0), it trusts Bitap without DP
When require_first_char is true, matches must start with the same first
character as the pattern (case-sensitive unless case_insensitive mode).
Note: This uses greedy-leftmost strategy. For best-match selection
(preferring higher similarity), use find_best_non_overlapping instead.
Sourcepub fn find_first_non_overlapping(
&self,
text: &str,
threshold: f32,
) -> Option<DamLevMatch>
pub fn find_first_non_overlapping( &self, text: &str, threshold: f32, ) -> Option<DamLevMatch>
Find the first match using the same algorithm as find_all_non_overlapping.
Returns as soon as a match is found, avoiding scanning the rest of the text.
Sourcepub fn find_n_non_overlapping(
&self,
text: &str,
threshold: f32,
require_first_char: bool,
n: usize,
) -> Vec<DamLevMatch>
pub fn find_n_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, n: usize, ) -> Vec<DamLevMatch>
Find up to n non-overlapping matches.
Stops searching after finding n matches for efficiency.
Sourcepub fn find_first(&self, text: &str, threshold: f32) -> Option<DamLevMatch>
pub fn find_first(&self, text: &str, threshold: f32) -> Option<DamLevMatch>
Find the first match in the text.
This delegates to find_all_non_overlapping and returns the first result.
While not optimal for all cases, this ensures correct behavior for edge cases
like transpositions and complex fuzzy matches.
Sourcepub fn find_first_with_candidates(
&self,
text: &str,
threshold: f32,
candidates: &FxHashSet<usize>,
) -> Option<DamLevMatch>
pub fn find_first_with_candidates( &self, text: &str, threshold: f32, candidates: &FxHashSet<usize>, ) -> Option<DamLevMatch>
Find first match starting from candidate positions only.
Sourcepub fn find_at_byte_position(
&self,
text: &[u8],
start_pos: usize,
threshold: f32,
) -> Option<DamLevMatch>
pub fn find_at_byte_position( &self, text: &[u8], start_pos: usize, threshold: f32, ) -> Option<DamLevMatch>
Ultra-fast search starting from a specific byte position.
This method is optimized for the greedy-first hot path:
- No allocations (uses stack arrays for small k)
- Direct byte iteration
- Early termination on first match
- SIMD acceleration when available (
AVX2onx86_64)
Sourcepub fn find_at_positions_parallel(
&self,
text: &[u8],
positions: &[usize],
threshold: f32,
) -> Option<(usize, DamLevMatch)>
pub fn find_at_positions_parallel( &self, text: &[u8], positions: &[usize], threshold: f32, ) -> Option<(usize, DamLevMatch)>
Search multiple positions in parallel using AVX2.
Sourcepub fn find_first_streaming(
&self,
text: &[u8],
threshold: f32,
) -> Option<DamLevMatch>
pub fn find_first_streaming( &self, text: &[u8], threshold: f32, ) -> Option<DamLevMatch>
Streaming search: scan entire text in one pass, return first match.
This is O(n * k) where n = text length, k = max edits.
Much faster for long texts than repeated find_at_byte_position calls.