Expand description
Compute distances between strings types (and others)
This crate provides implementations for a variety of distance or equality metrics. When using metrics that are a measure of similarity, the following should be noted: All implementations always return the value of the distance between two elements (e.g. str), i.e. their degree of dissimilarity. Which the implemented metrics that are designed to measure similarity (e.g. Jaccard index) will return the distance, which is complementary to the similarity score.
§Usage
§The str_distance::str_distance*
convenience functions.
str_distance
and str_distance_normalized
take the two string inputs for
which the distance is determined using the passed ’DistanceMetric.
str_distance_normalized` evaluates the normalized distance between two
strings. A value of ‘0.0’ corresponds to the “zero distance”, both strings
are considered equal by means of the metric, whereas a value of ‘1.0’
corresponds to the maximum distance that can exist between the strings.
Calling the str_distance::str_distance*
is just convenience for
DistanceMetric.str_distance*("", "")
§Example
Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.
Distance
use str_distance::*;
// calculate the exact distance
assert_eq!(str_distance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));
// short circuit if distance exceeds 10
let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head.";
let s2 = "The quick brown fox jumped over the angry dog.";
assert_eq!(str_distance(s1, s2, Levenshtein::with_max_distance(10)), DistanceValue::Exceeded(10));
Normalized Distance
use str_distance::*;
assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);
§The DistanceMetric
trait
use str_distance::{DistanceMetric, SorensenDice};
// QGram metrics require the length of the underlying fragment length to use for comparison.
// For `SorensenDice` default is 2.
assert_eq!(SorensenDice::new(2).str_distance("nacht", "night"), 0.75);
DistanceMetric
was designed for str
types, but is not limited to.
Calculating distance is possible for all data types which are comparable and
are passed as ‘IntoIterator’, e.g. as Vec
or slice
use str_distance::{DistanceMetric, Levenshtein, DistanceValue};
assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3);
Re-exports§
pub use jaro::Jaro;
pub use jaro::JaroWinkler;
pub use levenshtein::DamerauLevenshtein;
pub use levenshtein::Levenshtein;
pub use modifiers::Winkler;
pub use modifiers::WinklerConfig;
pub use qgram::Cosine;
pub use qgram::Jaccard;
pub use qgram::Overlap;
pub use qgram::QGram;
pub use qgram::SorensenDice;
pub use ratcliff::RatcliffObershelp;
pub use token::TokenSet;
pub use token::TokenSort;
Modules§
Enums§
Traits§
- Distance
Element - Convenience trait to use a distance on a type directly.
- Distance
Metric
Functions§
- str_
distance - Evaluates the distance between two strings based on the provided
crate::DistanceMetric
. - str_
distance_ normalized - Evaluates the normalized distance between two strings based on the provided
crate::DistanceMetric
, so that it returns always a f64 between 0 and 1. A value of ‘0.0’ corresponds to the “zero distance”, both strings are considered equal by means of the metric, whereas a value of ‘1.0’ corresponds to the maximum distance that can exist between the strings.