1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
//! # Jaccard Similarty tools (module not reexported)
use HashSet;
use ;
/// Calculate the Jaccard index on two [`HashSet`]s.
///
/// Returns the mathematical Jaccard index, i.e. `|A ∩ B| / |A ∪ B|`
///
/// Usually this is interfaced via [`jaccard`]; that is recommended unless your
/// data is already in a `HashSet`.
///
/// # Example
///
/// ```
/// use std::collections::HashSet;
/// use stringmetrics::jaccard_set;
///
/// let crew1 = HashSet::from(["Einar", "Olaf", "Harald"]);
/// let crew2 = HashSet::from(["Olaf", "Harald", "Birger"]);
///
/// assert_eq!(jaccard_set(&crew1, &crew2), 0.5);
///
/// ```
///
/// [`HashSet`]: std::collections::HashSet
/// [`jaccard`]: crate::algorithms::jaccard
/// Calculate the Jaccard index on two iterators using [`jaccard_set`]
///
/// Returns the mathematical Jaccard index, i.e. `|A ∩ B| / |A ∪ B|`. Iterators
/// can point to anything hashable. Often this is combined with an iterator
/// adapter such as [`std::str::Split`] and/or [`core::slice::Windows`] to
/// generate n-grams for text similarity. See [this wikipedia
/// page](https://en.wikipedia.org/wiki/N-gram) for descriptions on n-grams.
///
/// Note: If the data are interested in is already in a `HashSet`, use
/// [`jaccard_set`] to save the collection step.
///
/// # Example
///
/// ```
/// use stringmetrics::jaccard;
///
/// let crew1 = ["Einar", "Olaf", "Harald"];
/// let crew2 = ["Olaf", "Harald", "Birger"];
///
/// assert_eq!(jaccard(crew1.iter(), crew2.iter()), 0.5);
///
/// ```
///
/// Example using using 2-grams. See
/// [this execllent reference](https://www.cs.utah.edu/~jeffp/teaching/cs5140-S15/cs5140/L4-Jaccard+nGram.pdf)
/// for an in-depth explanation of Jaccard Index for k-grams/n-grams.
///
/// ```
/// use stringmetrics::jaccard;
///
/// let a = [["to", "be"], ["be", "or"], ["or", "not"]];
/// let b = [["who", "wants"], ["wants", "to"], ["to", "be"]];
///
/// assert_eq!(jaccard(a.iter(), b.iter()), 0.2);
///
/// ```
///