Crate datasketch_minhash_lsh[][src]

Expand description

minhash-lsh

Build Status

This crate reimplements the MinHash and MinHash LSH approaches from the Python package datasketch in Rust. It’s only a partial reimplementation, use it at your own risk.

Example MinHash

 use datasketch_minhash_lsh::MinHash;

 let mut m1 = <MinHash>::new(4, Some(1));
 let mut m2 = <MinHash>::new(4, Some(1));
 assert_eq!(m1.jaccard(&m2).unwrap(), 1.0);

 m2.update(&12);
 assert_eq!(m1.jaccard(&m2).unwrap(), 0.0);

 m1.update(&13);
 assert!(m1.jaccard(&m2).unwrap() < 1.0);

 m1.update(&12);
 let distance = m1.jaccard(&m2).unwrap();
 assert!(distance < 1.0 && distance > 0.0);

Example MinHashLsh

 use datasketch_minhash_lsh::{MinHashLsh, MinHash};

 let mut lsh = <MinHashLsh<&str>>::new(16, None, Some(0.5)).unwrap();
 let mut m1 = <MinHash>::new(16, Some(0));
 m1.update(&"a");

 let mut m2 = <MinHash>::new(16, Some(0));
 m2.update(&"b");

 lsh.insert("a", &m1).unwrap();
 lsh.insert("b", &m2).unwrap();

 let result = lsh.query(&m1).unwrap();
 assert!(result.contains(&"a"));

 let result = lsh.query(&m2).unwrap();
 assert!(result.contains(&"b"));
 assert!(result.len() <= 2);

Structs

A part of a HashValue used in MinHashLsh

A min-hash value generated by MinHash

The LSH params for the number of bands and the band size

The MinHash struct

The MinHashLsh struct

The weights configuring whether to prefer false positives or false negatives