Struct nucleo_matcher::Matcher
source · pub struct Matcher {
pub config: Config,
/* private fields */
}
Expand description
A matcher engine that can execute (fuzzy) matches.
A matches contains heap allocated scratch memory that is reused during
matching. This scratch memory allows the matcher to guarantee that it will
never allocate during matching (with the exception of pushing to the
indices
vector if there isn’t enough capacity). However this scratch
memory is fairly large (around 135KB) so creating a matcher is expensive.
All .._match
functions will not compute the indices of the matched
characters. These should be used to prefilter to filter and rank all
matches. All .._indices
functions will also compute the indices of the
matched characters but are slower compared to the ..match
variant. These
should be used when rendering the best N matches. Note that the indices
argument is never cleared. This allows running multiple different
matches on the same haystack and merging the indices by sorting and
deduplicating the vector.
The needle
argument for each function must always be normalized by the
caller (unicode normalization and case folding). Otherwise, the matcher
may fail to produce a match. The pattern
modules provides utilities
to preprocess needles and should usually be preferred over invoking the
matcher directly. Additionally it’s recommend to perform separate matches
for each word in the needle. Consider the folloling example:
If foo bar
is used as the needle it matches both foo test baaar
and
foo hello-world bar
. However, foo test baaar
will receive a higher
score than foo hello-world bar
. baaar
contains a 2 character gap which
will receive a penalty and therefore the user will likely expect it to rank
lower. However, if foo bar
is matched as a single query hello-world
and
test
are both considered gaps too. As hello-world
is a much longer gap
then test
the extra penalty for baaar
is canceled out. If both words
are matched individually the interspersed words do not receive a penalty and
foo hello-world bar
ranks higher.
In general nucleo is a substring matching tool (except for the prefix/
postfix matching modes) with no penalty assigned to matches that start
later within the same pattern (which enables matching words individually
as shown above). If patterns show a large variety in length and the syntax
described above is not used it may be preferable to give preference to
matches closer to the start of a haystack. To accommodate that usecase the
prefer_prefix
option can be set to true.
Matching is limited to 2^32-1 codepoints, if the haystack is longer than that the matcher will panic. The caller must decide whether it wants to filter out long haystacks or truncate them.
Fields§
§config: Config
Implementations§
source§impl Matcher
impl Matcher
sourcepub fn new(config: Config) -> Self
pub fn new(config: Config) -> Self
Creates a new matcher instance, note that this will eagerly allocate a fairly large chunk of heap memory (around 135KB currently but subject to change) so matchers should be reused if called often (like in a loop).
sourcepub fn fuzzy_match(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>
) -> Option<u16>
pub fn fuzzy_match( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_> ) -> Option<u16>
Find the fuzzy match with the highest score in the haystack
.
This functions has O(mn)
time complexity for short inputs.
To avoid slowdowns it automatically falls back to
greedy matching for large
needles and haystacks.
See the matcher documentation for more details.
sourcepub fn fuzzy_indices(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn fuzzy_indices( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Find the fuzzy match with the higehest score in the haystack
and
compute its indices.
This functions has O(mn)
time complexity for short inputs. To
avoid slowdowns it automatically falls back to [greedy matching]
(crate::Matcher::fuzzy_match_greedy) for large needles and haystacks
See the matcher documentation for more details.
sourcepub fn fuzzy_match_greedy(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>
) -> Option<u16>
pub fn fuzzy_match_greedy( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_> ) -> Option<u16>
Greedly find a fuzzy match in the haystack
.
This functions has O(n)
time complexity but may provide unintutive (non-optimal)
indices and scores. Usually fuzzy_match should
be preferred.
See the matcher documentation for more details.
sourcepub fn fuzzy_indices_greedy(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn fuzzy_indices_greedy( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Greedly find a fuzzy match in the haystack
and compute its indices.
This functions has O(n)
time complexity but may provide unintuitive (non-optimal)
indices and scores. Usually fuzzy_indices should
be preferred.
See the matcher documentation for more details.
sourcepub fn substring_match(
&mut self,
haystack: Utf32Str<'_>,
needle_: Utf32Str<'_>
) -> Option<u16>
pub fn substring_match( &mut self, haystack: Utf32Str<'_>, needle_: Utf32Str<'_> ) -> Option<u16>
Finds the substring match with the highest score in the haystack
.
This functions has O(nm)
time complexity. However many cases can
be significantly accelerated using prefilters so it’s usually very fast
in practice.
See the matcher documentation for more details.
sourcepub fn substring_indices(
&mut self,
haystack: Utf32Str<'_>,
needle_: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn substring_indices( &mut self, haystack: Utf32Str<'_>, needle_: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Finds the substring match with the highest score in the haystack
and
compute its indices.
This functions has O(nm)
time complexity. However many cases can
be significantly accelerated using prefilters so it’s usually fast
in practice.
See the matcher documentation for more details.
sourcepub fn exact_match(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>
) -> Option<u16>
pub fn exact_match( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_> ) -> Option<u16>
Checks whether needle and haystack match exactly.
This functions has O(n)
time complexity.
See the matcher documentation for more details.
sourcepub fn exact_indices(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn exact_indices( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Checks whether needle and haystack match exactly and compute the matches indices.
This functions has O(n)
time complexity.
See the matcher documentation for more details.
sourcepub fn prefix_match(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>
) -> Option<u16>
pub fn prefix_match( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_> ) -> Option<u16>
Checks whether needle is a prefix of the haystack.
This functions has O(n)
time complexity.
See the matcher documentation for more details.
sourcepub fn prefix_indices(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn prefix_indices( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Checks whether needle is a prefix of the haystack and compute the matches indices.
This functions has O(n)
time complexity.
See the matcher documentation for more details.
sourcepub fn postfix_match(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>
) -> Option<u16>
pub fn postfix_match( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_> ) -> Option<u16>
Checks whether needle is a postfix of the haystack.
This functions has O(n)
time complexity.
See the matcher documentation for more details.
sourcepub fn postfix_indices(
&mut self,
haystack: Utf32Str<'_>,
needle: Utf32Str<'_>,
indices: &mut Vec<u32>
) -> Option<u16>
pub fn postfix_indices( &mut self, haystack: Utf32Str<'_>, needle: Utf32Str<'_>, indices: &mut Vec<u32> ) -> Option<u16>
Checks whether needle is a postfix of the haystack and compute the matches indices.
This functions has O(n)
time complexity.
See the matcher documentation for more details.