1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
//! Estimating document similarity with MinHash.
//!
//! MinHash summarises each set as a fixed-length signature and estimates the
//! Jaccard similarity of two sets by comparing their signatures — without ever
//! intersecting the sets directly. This example splits two short documents into
//! word sets and estimates how alike they are.
//!
//! Run it with:
//!
//! ```text
//! cargo run --example similarity --release
//! ```
use MinHash;
const DOC_A: &str = "the quick brown fox jumps over the lazy dog";
const DOC_B: &str = "the quick brown cat jumps over the sleepy dog";
const DOC_C: &str = "completely unrelated text about distributed systems";