Skip to main content

BitapMatcher

Struct BitapMatcher 

Source
pub struct BitapMatcher { /* private fields */ }
Expand description

Bitap matcher for fuzzy string matching.

Implementations§

Source§

impl BitapMatcher

Source

pub fn new( pattern: &str, limits: EditLimits, case_insensitive: bool, ) -> Option<Self>

Create a new Bitap matcher.

Returns None if the pattern is too long (> 64 chars).

Source

pub fn pattern(&self) -> &str

Returns the original pattern string.

Source

pub fn pattern_chars(&self) -> &[char]

Returns the pattern as a slice of characters.

Source

pub fn get_skip(&self, byte: u8) -> usize

Get Boyer-Moore skip distance for a byte. Returns 0 if byte is in pattern, otherwise returns skip distance.

Source

pub fn find_next_candidate(&self, text: &[u8], start: usize) -> usize

Find the next position worth checking using Boyer-Moore skipping. Scans from start looking for a byte that’s in the pattern. Returns the position of the first pattern-relevant byte, or text.len() if none.

Source

pub fn find_all(&self, text: &str, threshold: f32) -> Vec<DamLevMatch>

Find all matches in text using Bitap algorithm with k errors.

Source

pub fn find_best_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>

Find all non-overlapping matches, preferring best (highest similarity) matches.

This method finds all overlapping candidates, sorts by similarity, then greedily selects non-overlapping matches starting from highest similarity. This ensures we prefer “Lorem” (sim=1.0) over “ore” (sim=0.6).

Matches must be at least pattern_len - max_edits characters long to be considered valid. This prevents overly short fuzzy matches.

When require_first_char is true, matches must start with the same first character as the pattern (case-insensitive). This filters out spurious matches like “bore” when searching for “Lorem”.

Source

pub fn find_all_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, ) -> Vec<DamLevMatch>

Fast find of non-overlapping matches optimized for iteration (greedy leftmost).

This is faster than find_all() followed by filtering because:

  1. It skips ahead after each match (no overlapping work)
  2. It only verifies the most likely start position per match
  3. For exact matches (d=0), it trusts Bitap without DP

When require_first_char is true, matches must start with the same first character as the pattern (case-sensitive unless case_insensitive mode).

Note: This uses greedy-leftmost strategy. For best-match selection (preferring higher similarity), use find_best_non_overlapping instead.

Source

pub fn find_first_non_overlapping( &self, text: &str, threshold: f32, ) -> Option<DamLevMatch>

Find the first match using the same algorithm as find_all_non_overlapping. Returns as soon as a match is found, avoiding scanning the rest of the text.

Source

pub fn find_n_non_overlapping( &self, text: &str, threshold: f32, require_first_char: bool, n: usize, ) -> Vec<DamLevMatch>

Find up to n non-overlapping matches. Stops searching after finding n matches for efficiency.

Source

pub fn find_first(&self, text: &str, threshold: f32) -> Option<DamLevMatch>

Find the first match in the text.

This delegates to find_all_non_overlapping and returns the first result. While not optimal for all cases, this ensures correct behavior for edge cases like transpositions and complex fuzzy matches.

Source

pub fn find_first_with_candidates( &self, text: &str, threshold: f32, candidates: &FxHashSet<usize>, ) -> Option<DamLevMatch>

Find first match starting from candidate positions only.

Source

pub fn find_at_byte_position( &self, text: &[u8], start_pos: usize, threshold: f32, ) -> Option<DamLevMatch>

Ultra-fast search starting from a specific byte position.

This method is optimized for the greedy-first hot path:

  • No allocations (uses stack arrays for small k)
  • Direct byte iteration
  • Early termination on first match
  • SIMD acceleration when available (AVX2 on x86_64)
Source

pub fn find_at_positions_parallel( &self, text: &[u8], positions: &[usize], threshold: f32, ) -> Option<(usize, DamLevMatch)>

Search multiple positions in parallel using AVX2.

Source

pub fn find_first_streaming( &self, text: &[u8], threshold: f32, ) -> Option<DamLevMatch>

Streaming search: scan entire text in one pass, return first match. This is O(n * k) where n = text length, k = max edits. Much faster for long texts than repeated find_at_byte_position calls.

Trait Implementations§

Source§

impl Debug for BitapMatcher

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.