Eddie
Fast and well-tested implementations of edit distance/string similarity metrics:
- Levenshtein,
- Damerau-Levenshtein,
- Hamming,
- Jaro,
- Jaro-Winkler.
Usage
Cargo.toml:
[]
= "0.1"
Levenshtein:
use Levenshtein;
let lev = new;
let dist = lev.distance;
assert_eq!;
Damerau-Levenshtein:
use DamerauLevenshtein;
let damlev = new;
let dist = damlev.distance;
assert_eq!;
Hamming:
use Hamming;
let hamming = new;
let dist = hamming.distance;
assert_eq!;
Jaro:
use Jaro;
let jaro = new;
let sim = jaro.similarity;
assert!;
Jaro-Winkler:
use JaroWinkler;
let jarwin = new;
let sim = jarwin.similarity;
assert!;
Complementary metrics
The main metric methods are complemented with inverted and/or relative versions. The naming convention across the crate is following:
distance
— a number of edits required to transform one string to the other;rel_dist
— a distance between two strings, relative to string length (inversion of similarity);similarity
— similarity between two strings (inversion of relative distance).
Performance
Below are the tables with performance measurements for typical word lengths, including alternative implementations from crates.io that have Unicode support. Produced on Intel Core i5-4278U 2,6 GHz. All measurements are in microseconds.
Levenshtein
len=3 | len=6 | len=9 | len=12 | len=15 | |
---|---|---|---|---|---|
eddie 0.2 | 0.05 | 0.12 | 0.18 | 0.35 | 0.57 |
edit_distance 2.1 | 0.15 | 0.19 | 0.26 | 0.49 | 0.60 |
strsim 0.9 | 0.13 | 0.23 | 0.27 | 0.43 | 0.64 |
distance 0.4 | 0.89 | 1.45 | 2.12 | 2.96 | 4.10 |
Damerau-Levenshtein
len=3 | len=6 | len=9 | len=12 | len=15 | |
---|---|---|---|---|---|
eddie 0.2 | 0.39 | 0.95 | 1.69 | 2.50 | 4.07 |
strsim 0.9 | 1.03 | 2.07 | 3.73 | 5.16 | 7.05 |
distance 0.4 | 1.74 | 3.40 | 5.17 | 7.16 | 10.66 |
Hamming
len=3 | len=6 | len=9 | len=12 | len=15 | |
---|---|---|---|---|---|
eddie 0.2 | 0.011 | 0.015 | 0.022 | 0.026 | 0.031 |
strsim 0.9 | 0.012 | 0.020 | 0.031 | 0.041 | 0.048 |
distance 0.4 | 0.016 | 0.031 | 0.035 | 0.045 | 0.062 |
Jaro
len=3 | len=6 | len=9 | len=12 | len=15 | |
---|---|---|---|---|---|
eddie 0.2 | 0.05 | 0.11 | 0.14 | 0.15 | 0.18 |
strsim 0.9 | 0.11 | 0.19 | 0.24 | 0.31 | 0.41 |
Jaro-Winkler
len=3 | len=6 | len=9 | len=12 | len=15 | |
---|---|---|---|---|---|
eddie 0.2 | 0.06 | 0.10 | 0.14 | 0.16 | 0.19 |
strsim 0.9 | 0.12 | 0.18 | 0.22 | 0.30 | 0.39 |
natural 0.3 | 0.32 | 0.87 | 0.99 | 1.56 | 1.64 |