[−][src]Crate lsh_rs
lsh-rs (Locality Sensitive Hashing)
Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.
For more information on the subject see:
Hashing implementations
- Signed Random Projections (Cosine similarity)
- L2 distance
- Maximum Inner Product (Dot products)
Getting started
use lsh_rs::LSH; // 2 rows w/ dimension 3. let p = &[vec![1., 1.5, 2.], vec![2., 1.1, -0.3]]; // Do one time expensive preprocessing. let n_projections = 9; let n_hash_tables = 30; let dim = 3; let mut lsh = LSH::new(n_projections, n_hash_tables, dim).srp(); lsh.store_vecs(p); // Query in sublinear time. let query = &[1.1, 1.2, 1.2]; lsh.query_bucket(query);
Signed Random Projections
LSH for maximum cosine similarity search.
let mut lsh = LSH::new(n_projections, n_hash_tables, dim).srp();
L2
LSH for minimal L2 distance search.
// hyper parameter r in https://arxiv.org/pdf/1411.3787.pdf (eq. 8) let bucket_width = 2.2; let mut lsh = LSH::new(n_projections, n_hash_tables, dim).l2(bucket_width);
Maximum Inner Product (MIPS)
LSH for maximum inner product search.
let bucket_width = 2.2; // l2(x) < U < 1.0 let U = 0.83; // number of concatenations let m = 3; let mut lsh = LSH::new(n_projections, n_hash_tables, dim).mips(r, U, m);
Seed
Random projections are used to generate the hash functions. The default seeding of randomness is taken from the system. If you want to have reproducable outcomes, you can set a manual seed.
let mut lsh = LSH::new(n_projections, n_hash_tables, dim).seed(12).srp();
Unique indexes
Instead of storing data points as vectors. Storing L
copies of the data points (one in every
hash table). You can choose to only store unique indexes of the data points. The index ids are
assigned in chronological order. This will drastically decrease the required memory.
let mut lsh = LSH::new(n_projections, n_hash_tables, dim).only_index().srp();
BLAS support
Utilizing BLAS will heavily increase
performance. To make use of BLAS, install lsh-rs
w/ "blas"
feature and reinstall ndarray
w/ "blas"
support.
Cargo.toml:
lsh-rs = {version ="x.x"}, features=["blas"]}
ndarray = {version = "0.13", features=["blas"]}
# Or any other blas backend.
blas-src = { version = "0.5", defeault-features = false, features = ["openblas"]}
Modules
utils |
Structs
L2 | L2 Hasher family. Read more. |
LSH | Wrapper for LSH functionality. |
MIPS | Maximum Inner Product Search. Read more. |
MemoryTable | In memory storage of hashed vectors/ indexes. |
SignRandomProjections | Also called SimHash. A family of hashers for the cosine similarity. |
Traits
VecHash |