[−][src]Crate group_similar
This crate enables grouping values based on string similarity via Jaro-Winkler distance and complete-linkage clustering.
Example: Identify likely repeated merchants based on merchant name
use group_similar::{Config, Named, Threshold, group_similar}; #[derive(Eq, PartialEq, std::hash::Hash, Debug)] struct Merchant { id: usize, name: String } impl Named for Merchant { fn name(&self) -> &str { &self.name } } let merchants = vec![ Merchant { id: 1, name: "McDonalds 105109".to_string() }, Merchant { id: 2, name: "McDonalds 105110".to_string() }, Merchant { id: 3, name: "Target ID1244".to_string() }, Merchant { id: 4, name: "Target ID125".to_string() }, Merchant { id: 5, name: "Amazon.com TID120159120".to_string() }, Merchant { id: 6, name: "Target".to_string() }, Merchant { id: 7, name: "Target.com".to_string() }, ]; let config = Config::jaro_winkler(Threshold::default()); let results = group_similar(&merchants, &config); assert_eq!(results.get(&merchants[0]), Some(&vec![&merchants[1]])); assert_eq!(results.get(&merchants[2]), Some(&vec![&merchants[3], &merchants[5], &merchants[6]])); assert_eq!(results.get(&merchants[4]), Some(&vec![]));
Structs
Config |
|
Threshold |
|
Traits
Named |
|
Functions
group_similar | Group records based on a particular configuration |