Struct BitapMatcher

Source

pub struct BitapMatcher { /* private fields */ }

Expand description

Bitap matcher for fuzzy string matching.

Implementations§

Source §

impl BitapMatcher

Source

pub fn new( pattern: &str, limits: EditLimits, case_insensitive: bool, ) -> Option<Self>

Create a new Bitap matcher.

Returns None if the pattern is too long (> 64 chars).

Source

pub fn pattern(&self) -> &str

Returns the original pattern string.

Source

pub fn pattern_chars(&self) -> &[char]

Returns the pattern as a slice of characters.

Source

pub fn get_skip(&self, byte: u8) -> usize

Get Boyer-Moore skip distance for a byte. Returns 0 if byte is in pattern, otherwise returns skip distance.

Source

pub fn find_next_candidate(&self, text: &[u8], start: usize) -> usize

Find the next position worth checking using Boyer-Moore skipping. Scans from start looking for a byte that’s in the pattern. Returns the position of the first pattern-relevant byte, or text.len() if none.

Source

pub fn find_all(&self, text: &str, threshold: f32) -> Vec<DamLevMatch>

Find all matches in text using Bitap algorithm with k errors.

Source

pub fn find_best_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>

Find all non-overlapping matches, preferring best (highest similarity) matches.

This method finds all overlapping candidates, sorts by similarity, then greedily selects non-overlapping matches starting from highest similarity. This ensures we prefer “Lorem” (sim=1.0) over “ore” (sim=0.6).

Matches must be at least pattern_len - max_edits characters long to be considered valid. This prevents overly short fuzzy matches.

When require_first_char is true, matches must start with the same first character as the pattern (case-insensitive). This filters out spurious matches like “bore” when searching for “Lorem”.

Source

pub fn find_all_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>

Fast find of non-overlapping matches optimized for iteration (greedy leftmost).

This is faster than find_all() followed by filtering because:

It skips ahead after each match (no overlapping work)
It only verifies the most likely start position per match
For exact matches (d=0), it trusts Bitap without DP

When require_first_char is true, matches must start with the same first character as the pattern (case-sensitive unless case_insensitive mode).

Note: This uses greedy-leftmost strategy. For best-match selection (preferring higher similarity), use find_best_non_overlapping instead.

Source

pub fn find_first_non_overlapping( &self, text: &str, threshold: f32, ) -> Option<DamLevMatch>

Find the first match using the same algorithm as find_all_non_overlapping. Returns as soon as a match is found, avoiding scanning the rest of the text.

Source

pub fn find_n_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, n: usize, ) -> Vec<DamLevMatch>

Find up to n non-overlapping matches. Stops searching after finding n matches for efficiency.

Source

pub fn find_first(&self, text: &str, threshold: f32) -> Option<DamLevMatch>

Find the first match in the text.

This delegates to find_all_non_overlapping and returns the first result. While not optimal for all cases, this ensures correct behavior for edge cases like transpositions and complex fuzzy matches.

Source

pub fn find_first_with_candidates( &self, text: &str, threshold: f32, candidates: &FxHashSet<usize>, ) -> Option<DamLevMatch>

Find first match starting from candidate positions only.

Source

pub fn find_at_byte_position( &self, text: &[u8], start_pos: usize, threshold: f32, ) -> Option<DamLevMatch>

Ultra-fast search starting from a specific byte position.

This method is optimized for the greedy-first hot path:

No allocations (uses stack arrays for small k)
Direct byte iteration
Early termination on first match
SIMD acceleration when available (AVX2 on x86_64)

Source

pub fn find_at_positions_parallel( &self, text: &[u8], positions: &[usize], threshold: f32, ) -> Option<(usize, DamLevMatch)>

Search multiple positions in parallel using AVX2.

Source

pub fn find_first_streaming( &self, text: &[u8], threshold: f32, ) -> Option<DamLevMatch>

Streaming search: scan entire text in one pass, return first match. This is O(n * k) where n = text length, k = max edits. Much faster for long texts than repeated find_at_byte_position calls.