str-distance 0.1.0

Distance metrics to evaluate distances between strings.
[![Build Status](](

A crate to evaluate distances between strings (and others).

Heavily inspired by the julia [StringDistances](

## Distance Metrics

- [Jaro Distance]
- [Levenshtein Distance]
- [Damerau-Levenshtein Distance] 
- [RatcliffObershelp Distance]

- Q-gram distances compare the set of all slices of length `q` in each str, where `q > 0`
	- QGram Distance `Qgram::new(usize)`
	- [Cosine Distance] `Cosine::new(usize)`
	- [Jaccard Distance] `Jaccard::new(usize)`
	- [Sorensen-Dice Distance] `SorensenDice::new(usize)`
	- [Overlap Distance] `Overlap::new(usize)`
- The crate includes distance "modifiers", that can be applied to any distance.
	- [Winkler] diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance.
	- [TokenSort] adjusts for differences in word orders by reording words alphabetically. 
	- [TokenSet] adjusts for differences in word orders and word numbers by comparing the intersection of two strings with each string.
## Usage

### The `str_distance::str_distance*` convenience functions.

`str_distance` and `str_distance_normalized` take the two string inputs for which the distance is determined using the passed 'DistanceMetric`.
`str_distance_normalized` evaluates the normalized distance between two strings. A value of '0.0' corresponds to the "zero distance", both strings are considered equal by means of the metric, whereas a value of '1.0' corresponds to the maximum distance that can exist between the strings.

Calling the `str_distance::str_distance*` is just convenience for `DistanceMetric.str_distance*("", "")` 

#### Example

Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.


use str_distance::*;

// calculate the exact distance 
assert_eq!(str_distance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));

// short circuit if distance exceeds 10
let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head.";
let s2 = "The quick brown fox jumped over the angry dog.";
assert_eq!(str_distance(s1, s2, Levenshtein::with_max_distance(10)), DistanceValue::Exceeded(10));

**Normalized Distance**

use str_distance::*;
assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);

### The `DistanceMetric` trait

use str_distance::{DistanceMetric, SorensenDice};
// QGram metrics require the length of the underlying fragment length to use for comparison.
// For `SorensenDice` default is 2.
assert_eq!(SorensenDice::new(2).str_distance("nacht", "night"), 0.75);


`DistanceMetric` was designed for `str` types, but is not limited to. Calculating distance is possible for all data types which are comparable and are passed as 'IntoIterator', e.g. as `Vec`

use str_distance::{DistanceMetric, Levenshtein, DistanceValue};

assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3);

## Documentation

Full docs available at [](

## References

- [StringDistances]
- [The stringdist Package for Approximate String Matching] Mark P.J. van der Loo
- [fuzzywuzzy]

## License

Licensed under either of these:

 * Apache License, Version 2.0, ([LICENSE-APACHE]LICENSE-APACHE or