Crate rapidfuzz

source ·
Expand description

RapidFuzz is a general purpose string matching library with implementations for Rust, C++ and Python.

Key Features

  • Diverse String Metrics: Offers a variety of string metrics to suit different use cases. These range from the Levenshtein distance for edit-based comparisons to the Jaro-Winkler similarity for more nuanced similarity assessments.
  • Optimized for Speed: The library is designed with performance in mind. Each implementation is carefully designed to ensure optimal performance, making it suitable for the analysis of large datasets.
  • Easy to use: The API is designed to be simple to use, while still giving the implementation room for optimization.

Installation

The installation is as simple as:

$ cargo add rapidfuzz

Usage

The following examples show the usage with the Levenshtein distance. Other metrics can be found in the fuzz and distance modules.

use rapidfuzz::distance::levenshtein;

// Perform a simple comparision using he levenshtein distance
assert_eq!(
    3,
    levenshtein::distance("kitten".chars(), "sitting".chars())
);

// If you are sure the input strings are ASCII only it's usually faster to operate on bytes
assert_eq!(
    3,
    levenshtein::distance("kitten".bytes(), "sitting".bytes())
);

// You can provide a score_cutoff value to filter out strings with distance that is worse than
// the score_cutoff
assert_eq!(
    None,
    levenshtein::distance_with_args(
        "kitten".chars(),
        "sitting".chars(),
        &levenshtein::Args::default().score_cutoff(2)
    )
);

// You can provide a score_hint to tell the implementation about the expected score.
// This can be used to select a more performant implementation internally, but might cause
// a slowdown in cases where the distance is actually worse than the score_hint
assert_eq!(
    3,
    levenshtein::distance_with_args(
        "kitten".chars(),
        "sitting".chars(),
        &levenshtein::Args::default().score_hint(2)
    )
);

// When comparing a single string to multiple strings you can use the
// provided `BatchComparators`. These can cache part of the calculation
// which can provide significant speedups
let scorer = levenshtein::BatchComparator::new("kitten".chars());
assert_eq!(3, scorer.distance("sitting".chars()));
assert_eq!(0, scorer.distance("kitten".chars()));

Modules

Enums

  • Hash value in the range i64::MIN - u64::MAX

Traits

  • trait used to map between element types and unique hash values