[−][src]Crate lsh_rs
lsh-rs (Locality Sensitive Hashing)
Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.
For more information on the subject see:
- Introduction on LSH
- Section 2. describes the hash families used in this crate
- LSH and neural networks
Implementations
- Base LSH
- Signed Random Projections (Cosine similarity)
- L2 distance
- Maximum Inner Product (Dot products)
- Multi Probe LSH
- Step wise probing
- SRP
- L2
- MIPS
- Query directed probing
- L2
- Step wise probing
Getting started
use lsh_rs::LshMem; // 2 rows w/ dimension 3. let p = &[vec![1., 1.5, 2.], vec![2., 1.1, -0.3]]; // Do one time expensive preprocessing. let n_projections = 9; let n_hash_tables = 30; let dim = 3; let mut lsh = LshMem::new(n_projections, n_hash_tables, dim).srp(); lsh.store_vecs(p); // Query in sublinear time. let query = &[1.1, 1.2, 1.2]; lsh.query_bucket(query);
Signed Random Projections
LSH for maximum cosine similarity search.
use lsh_rs::LshSql; let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).srp();
L2
LSH for minimal L2 distance search.
// hyper parameter r in https://arxiv.org/pdf/1411.3787.pdf (eq. 8) use lsh_rs::LshSql; let bucket_width = 2.2; let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).l2(bucket_width);
Maximum Inner Product (MIPS)
LSH for maximum inner product search.
use lsh_rs::LshSql; let bucket_width = 2.2; // l2(x) < U < 1.0 let U = 0.83; // number of concatenations let m = 3; let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).mips(r, U, m);
Seed
Random projections are used to generate the hash functions. The default seeding of randomness is taken from the system. If you want to have reproducable outcomes, you can set a manual seed.
use lsh_rs::LshSql; let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).seed(12).srp();
Unique indexes
Instead of storing data points as vectors. Storing L
copies of the data points (one in every
hash table). You can choose to only store unique indexes of the data points. The index ids are
assigned in chronological order. This will drastically decrease the required memory.
use lsh_rs::LshSql; let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).only_index().srp();
Builder pattern methods
The following methods can be used to change internal state during object initialization:
BLAS support
Utilizing BLAS will heavily increase
performance. To make use of BLAS, install lsh-rs
w/ "blas"
feature and reinstall ndarray
w/ "blas"
support.
Cargo.toml:
lsh-rs = {version ="x.x"}, features=["blas"]}
ndarray = {version = "0.13", features=["blas"]}
# Or any other blas backend.
blas-src = { version = "0.6", defeault-features = false, features = ["openblas"]}
Backends
The LSH struct is exposed with multiple backends that store the hashes.
Modules
dist | |
stats | |
utils |
Structs
L2 | L2 Hasher family. Read more. |
LSH | Wrapper for LSH functionality. Can be initialized following the Builder pattern. |
MIPS | Maximum Inner Product Search. Read more. |
MemoryTable | In memory backend for LSH. |
SignRandomProjections | Also called SimHash. A family of hashers for the cosine similarity. |
SqlTable | Sqlite backend for LSH. |
SqlTableMem | In memory Sqlite backend for LSH. |
Enums
Error |
Traits
HashTables | Hashtable consisting of |
VecHash |
Type Definitions
DataPoint | |
DataPointSlice | |
FloatSize | |
Hash | |
HashPrimitive | |
LshMem | |
LshSql | |
LshSqlMem | |
Result |