lsh_rs2/
lib.rs

1//! # lsh-rs (Locality Sensitive Hashing)
2//!
3//! Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.
4//!
5//! For more information on the subject see:
6//! * [Introduction on LSH](http://people.csail.mit.edu/gregory/annbook/introduction.pdf)
7//! * [Section 2. describes the hash families used in this crate](https://arxiv.org/pdf/1411.3787.pdf)
8//! * [LSH and neural networks](https://www.ritchievink.com/blog/2020/04/07/sparse-neural-networks-and-hash-tables-with-locality-sensitive-hashing/)
9//!
10//! ###
11//!
12//! ## Implementations
13//!
14//! * **Base LSH**
15//!     - Signed Random Projections *(Cosine similarity)*
16//!     - L2 distance
17//!     - MIPS *(Dot products/ Maximum Inner Product Search)*
18//!     - MinHash *(Jaccard Similarity)*
19//! * **Multi Probe LSH**
20//!     - **Step wise probing**
21//!         - SRP (only bit shifts)
22//!     - **Query directed probing**
23//!         - L2
24//!         - MIPS
25//! * Generic numeric types
26//!
27//! ## Features
28//! * "blas"
29//! * "sqlite"
30//!
31//! ## Getting started
32//!
33//! ```rust
34//! use lsh_rs::prelude::*;
35//! // 2 rows w/ dimension 3.
36//! let p = &[vec![1., 1.5, 2.],
37//!         vec![2., 1.1, -0.3]];
38//!
39//! // Do one time expensive preprocessing.
40//! let n_projections = 9;
41//! let n_hash_tables = 30;
42//! let dim = 10;
43//! let dim = 3;
44//! let mut lsh = LshMem::new(n_projections, n_hash_tables, dim)
45//!     .srp()
46//!     .unwrap();
47//! lsh.store_vecs(p);
48//!
49//! // Query in sublinear time.
50//! let query = &[1.1, 1.2, 1.2];
51//! lsh.query_bucket(query);
52//! ```
53//!
54//! ## Signed Random Projections
55//! LSH for maximum cosine similarity search.
56//! ```rust
57//! # use lsh_rs::prelude::*;
58//! # let n_projections = 9;
59//! # let n_hash_tables = 30;
60//! # let dim = 10;
61//! let mut lsh = LshMem::<_, f32>::new(n_projections, n_hash_tables, dim)
62//!     .srp()
63//!     .unwrap();
64//! ```
65//!
66//! ## L2
67//! LSH for minimal L2 distance search.
68//!
69//! ```
70//! // hyper parameter r in https://arxiv.org/pdf/1411.3787.pdf (eq. 8)
71//! # use lsh_rs::prelude::*;
72//! # let bucket_width = 2.2;
73//! # let n_projections = 9;
74//! # let n_hash_tables = 10;
75//! # let dim = 10;
76//! let mut lsh = LshMem::<_, f32>::new(n_projections, n_hash_tables, dim)
77//!     .l2(bucket_width)
78//!     .unwrap();
79//! ```
80//!
81//! ## Jaccard Index
82//! LSH for the Jaccard Index
83//! ```rust
84//! # use lsh_rs::prelude::*;
85//! # let n_projections = 14;
86//! // length of the shingles vector
87//! let dim = 2500;
88//! # let n_hash_tables = 10;
89//! let mut lsh = LshSqlMem::<_, u16>::new(n_projections, n_hash_tables, dim)
90//!     .minhash()
91//!     .unwrap();
92//! ```
93//!
94//! ## Maximum Inner Product (MIPS)
95//! LSH for maximum inner product search.
96//! ```rust
97//! # use lsh_rs::prelude::*;
98//! let bucket_width = 2.2;
99//! // l2(x) < U < 1.0
100//! let U = 0.83;
101//! let r = 4.;
102//! // number of concatenations
103//! let m = 3;
104//! let n_projections = 15;
105//! let n_hash_tables = 10;
106//! let dim = 10;
107//! let mut lsh = LshMem::<_, f32>::new(n_projections, n_hash_tables, dim)
108//!     .mips(r, U, m)
109//!     .unwrap();
110//! ```
111//!
112//! ## Seed
113//! Random projections are used to generate the hash functions. The default seeding of randomness
114//! is taken from the system. If you want to have reproducable outcomes, you can set a manual seed.
115//!
116//! ```rust
117//! # use lsh_rs::prelude::*;
118//! # let n_projections = 9;
119//! # let n_hash_tables = 10;
120//! # let dim = 10;
121//! let mut lsh = LshMem::<_, f32>::new(n_projections, n_hash_tables, dim)
122//!     .seed(12)
123//!     .srp()
124//!     .unwrap();
125//! ```
126//!
127//! ## Unique indexes
128//! Instead of storing data points as vectors. Storing `L` copies of the data points (one in every
129//! hash table). You can choose to only store unique indexes of the data points. The index ids are
130//! assigned in chronological order. This will drastically decrease the required memory.
131//! ```rust
132//! # use lsh_rs::prelude::*;
133//! # let n_projections = 9;
134//! # let n_hash_tables = 10;
135//! # let dim = 10;
136//! let mut lsh = LshMem::<_, f32>::new(n_projections, n_hash_tables, dim)
137//!     .only_index()
138//!     .srp()
139//!     .unwrap();
140//! ```
141//!
142//! ## Builder pattern methods
143//! The following methods can be used to change internal state during object initialization:
144//! * [only_index](struct.LSH.html#method.only_index)
145//! * [seed](struct.LSH.html#method.seed)
146//! * [set_database_file](struct.LSH.html#method.set_database_file)
147//! * [multi_probe](struct.LSH.html#method.multi_probe)
148//! * [increase_storage](struct.LSH.html#method.increase_storage)
149//! * [fit (only for MIPS)](struct.MIPS.html#method.fit)
150//!
151//! ## Backends
152//! The [LSH struct](struct.LSH.html) is exposed with multiple backends that store the hashes.
153//! * in memory (fastest / can save state with serialization) [LshMem](type.LshMem.html)
154//! * SQLite (slower due to disk io, but automatic state preservation between sessions) [LshSql](type.LshSql.html)
155//! * in memory SQLite (can backup to SQLite when processing is done) [LshSqlMem](type.LshSqlMem.html)
156//!
157//! ## Hash primitives
158//! The hashers in this crate will produces hashes of type `Vec<T>`. Where `T` should be one of `i8`,
159//! `i16`, `i32` or `i64`. This concrete primitive value can be set by choosing on of the utillity types
160//! in the following sub-modules:
161//! * [hi8](prelude/hi8/index.html)
162//! * [hi16](prelude/hi16/index.html)
163//! * [hi32](prelude/hi32/index.html)
164//! * [hi64](prelude/hi64/index.html)
165//!
166//! Using smaller primitives for the hash values, will result in less space requirements and greater
167//! performance. However this may lead to panics if the hash value doesn't fit the chosen primitive
168//! due to buffer overflow.
169//!
170//! *Note: the hash primitive cannot be set for every Hash family that has implemented
171//! [VecHash](trait.VecHash.html). For instance, [SignRandomProjections](struct.SignRandomProjections.html)
172//! will allways use `i8` as hash primitive.*
173//!
174//! ```rust
175//! # use lsh_rs::prelude::*;
176//! # let n_projections = 9;
177//! # let n_hash_tables = 10;
178//! # let dim = 10;
179//! // use i8 hash values:
180//! let lsh_i8 = hi8::LshMem::<_, u8>::new(n_projections, n_hash_tables, dim)
181//!     .minhash()
182//!     .unwrap();
183//! // use i64 hash values:
184//! let lhs_i8 = hi64::LshMem::<_, u8>::new(n_projections, n_hash_tables, dim)
185//!     .minhash()
186//!     .unwrap();
187//! ```
188//!
189//! ## BLAS support
190//! Utilizing [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) will heavily increase
191//! performance. To make use of BLAS, install `lsh-rs` with `"blas"` feature and reinstall `ndarray` with `"blas"` support.
192//!  <br>
193//!  <br>
194//! **Cargo.toml:**
195//! ```toml
196//! lsh-rs = {version ="x.x"}, features=["blas"]}
197//! ndarray = {version = "0.13", features=["blas"]}
198//! # Or any other blas backend.
199//! blas-src = { version = "0.6", defeault-features = false, features = ["openblas"]}
200//! ```
201//!
202//! ## Need your own hashers?
203//! The LSH struct can easily be extended with your own hashers. Your own hasher structs need
204//! to implement [VecHash<N, K>](trait.VecHash.html). `N` and `K` are generic types of the input
205//! and output numbers respectively.
206//!
207//! ## Need you own backend?
208//! If you need another backend, you can extend you backend with the [HashTables<N, K>](trait.HashTables.html) trait.
209#![allow(dead_code, non_snake_case)]
210#[cfg(feature = "blas")]
211extern crate blas_src;
212extern crate ndarray;
213mod hash;
214mod lsh {
215    pub mod lsh;
216    mod test;
217}
218pub mod dist;
219mod multi_probe;
220mod table {
221    pub mod general;
222    pub mod mem;
223    pub mod sqlite;
224    pub mod sqlite_mem;
225}
226mod constants;
227mod error;
228
229#[cfg(feature = "workspace")]
230pub mod utils;
231#[cfg(not(feature = "workspace"))]
232mod utils;
233pub use hash::VecHash;
234pub use multi_probe::{QueryDirectedProbe, StepWiseProbe};
235pub use table::{general::HashTables, mem::MemoryTable};
236#[cfg(feature = "sqlite")]
237pub use table::{sqlite::SqlTable, sqlite_mem::SqlTableMem};
238pub mod data;
239pub mod prelude;
240pub mod stats;