Expand description
RapidFuzz
is a general purpose string matching library with implementations
for Rust, C++ and Python.
Key Features
- Diverse String Metrics: Offers a variety of string metrics to suit different use cases. These range from the Levenshtein distance for edit-based comparisons to the Jaro-Winkler similarity for more nuanced similarity assessments.
- Optimized for Speed: The library is designed with performance in mind. Each implementation is carefully designed to ensure optimal performance, making it suitable for the analysis of large datasets.
- Easy to use: The API is designed to be simple to use, while still giving the implementation room for optimization.
Installation
The installation is as simple as:
$ cargo add rapidfuzz
Usage
The following examples show the usage with the Levenshtein
distance. Other metrics
can be found in the fuzz
and distance
modules.
use rapidfuzz::distance::levenshtein;
// Perform a simple comparision using he levenshtein distance
assert_eq!(
3,
levenshtein::distance("kitten".chars(), "sitting".chars())
);
// If you are sure the input strings are ASCII only it's usually faster to operate on bytes
assert_eq!(
3,
levenshtein::distance("kitten".bytes(), "sitting".bytes())
);
// You can provide a score_cutoff value to filter out strings with distance that is worse than
// the score_cutoff
assert_eq!(
None,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_cutoff(2)
)
);
// You can provide a score_hint to tell the implementation about the expected score.
// This can be used to select a more performant implementation internally, but might cause
// a slowdown in cases where the distance is actually worse than the score_hint
assert_eq!(
3,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_hint(2)
)
);
// When comparing a single string to multiple strings you can use the
// provided `BatchComparators`. These can cache part of the calculation
// which can provide significant speedups
let scorer = levenshtein::BatchComparator::new("kitten".chars());
assert_eq!(3, scorer.distance("sitting".chars()));
assert_eq!(0, scorer.distance("kitten".chars()));
Modules
Enums
- Hash value in the range
i64::MIN
-u64::MAX
Traits
- trait used to map between element types and unique hash values