[−][src]Crate str_distance
Compute distances between strings types (and others)
This crate provides implementations for a variety of distance or equality metrics. When using metrics that are a measure of similarity, the following should be noted: All implementations always return the value of the distance between two elements (e.g. str), i.e. their degree of dissimilarity. Which the implemented metrics that are designed to measure similarity (e.g. Jaccard index) will return the distance, which is complementary to the similarity score.
Usage
The str_distance::str_distance*
convenience functions.
str_distance
and str_distance_normalized
take the two string inputs for
which the distance is determined using the passed 'DistanceMetric.
str_distance_normalized` evaluates the normalized distance between two
strings. A value of '0.0' corresponds to the "zero distance", both strings
are considered equal by means of the metric, whereas a value of '1.0'
corresponds to the maximum distance that can exist between the strings.
Calling the str_distance::str_distance*
is just convenience for
DistanceMetric.str_distance*("", "")
Example
Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.
Distance
use str_distance::*; // calculate the exact distance assert_eq!(str_distance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3)); // short circuit if distance exceeds 10 let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head."; let s2 = "The quick brown fox jumped over the angry dog."; assert_eq!(str_distance(s1, s2, Levenshtein::with_max_distance(10)), DistanceValue::Exceeded(10));
Normalized Distance
use str_distance::*; assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0); assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0); assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);
The DistanceMetric
trait
use str_distance::{DistanceMetric, SorensenDice}; // QGram metrics require the length of the underlying fragment length to use for comparison. // For `SorensenDice` default is 2. assert_eq!(SorensenDice::new(2).str_distance("nacht", "night"), 0.75);
DistanceMetric
was designed for str
types, but is not limited to.
Calculating distance is possible for all data types which are comparable and
are passed as 'IntoIterator', e.g. as Vec
or slice
use str_distance::{DistanceMetric, Levenshtein, DistanceValue}; assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3);
Re-exports
pub use jaro::Jaro; |
pub use jaro::JaroWinkler; |
pub use levenshtein::DamerauLevenshtein; |
pub use levenshtein::Levenshtein; |
pub use modifiers::Winkler; |
pub use modifiers::WinklerConfig; |
pub use qgram::Cosine; |
pub use qgram::Jaccard; |
pub use qgram::Overlap; |
pub use qgram::QGram; |
pub use qgram::SorensenDice; |
pub use ratcliff::RatcliffObershelp; |
pub use token::TokenSet; |
pub use token::TokenSort; |
Modules
jaro | |
levenshtein | |
modifiers | |
qgram | |
ratcliff | |
token |
Enums
DistanceValue |
Traits
DistanceElement | Convenience trait to use a distance on a type directly. |
DistanceMetric |
Functions
str_distance | Evaluates the distance between two strings based on the provided
|
str_distance_normalized | Evaluates the normalized distance between two strings based on the provided
|