1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
//! # lsh-rs (Locality Sensitive Hashing)
//!
//! Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.
//!
//! For more information on the subject see:
//! * [Introduction on LSH](http://people.csail.mit.edu/gregory/annbook/introduction.pdf)
//! * [Section 2. describes the hash families used in this crate](https://arxiv.org/pdf/1411.3787.pdf)
//! * [LSH and neural networks](https://www.ritchievink.com/blog/2020/04/07/sparse-neural-networks-and-hash-tables-with-locality-sensitive-hashing/)
//!
//! ## Implementations
//!
//! * **Base LSH**
//!     - Signed Random Projections (Cosine similarity)
//!     - L2 distance
//!     - Maximum Inner Product (Dot products)
//! * **Multi Probe LSH**
//!     - **Step wise probing**
//!         - SRP
//!         - L2
//!         - MIPS
//!     - **Query directed probing**
//!         - L2
//!
//! ## Getting started
//!
//! ```rust
//! use lsh_rs::LshMem;
//! // 2 rows w/ dimension 3.
//! let p = &[vec![1., 1.5, 2.],
//!         vec![2., 1.1, -0.3]];
//!
//! // Do one time expensive preprocessing.
//! let n_projections = 9;
//! let n_hash_tables = 30;
//! let dim = 3;
//! let mut lsh = LshMem::new(n_projections, n_hash_tables, dim).srp();
//! lsh.store_vecs(p);
//!
//! // Query in sublinear time.
//! let query = &[1.1, 1.2, 1.2];
//! lsh.query_bucket(query);
//! ```
//!
//! ## Signed Random Projections
//! LSH for maximum cosine similarity search.
//! ```rust
//! use lsh_rs::LshSql;
//! let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).srp();
//! ```
//!
//! ## L2
//! LSH for minimal L2 distance search.
//!
//! ```
//! // hyper parameter r in https://arxiv.org/pdf/1411.3787.pdf (eq. 8)
//! use lsh_rs::LshSql;
//! let bucket_width = 2.2;
//! let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).l2(bucket_width);
//! ```
//!
//! ## Maximum Inner Product (MIPS)
//! LSH for maximum inner product search.
//! ```rust
//! use lsh_rs::LshSql;
//! let bucket_width = 2.2;
//! // l2(x) < U < 1.0
//! let U = 0.83;
//! // number of concatenations
//! let m = 3;
//! let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).mips(r, U, m);
//! ```
//!
//! ## Seed
//! Random projections are used to generate the hash functions. The default seeding of randomness
//! is taken from the system. If you want to have reproducable outcomes, you can set a manual seed.
//!
//! ```rust
//! use lsh_rs::LshSql;
//! let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).seed(12).srp();
//! ```
//!
//! ## Unique indexes
//! Instead of storing data points as vectors. Storing `L` copies of the data points (one in every
//! hash table). You can choose to only store unique indexes of the data points. The index ids are
//! assigned in chronological order. This will drastically decrease the required memory.
//! ```rust
//! use lsh_rs::LshSql;
//! let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).only_index().srp();
//! ```
//!
//! ## Builder pattern methods
//! The following methods can be used to change internal state during object initialization:
//! * [only_index](struct.LSH.html#method.only_index)
//! * [seed](struct.LSH.html#method.seed)
//! * [set_database_file](struct.LSH.html#method.set_database_file)
//! * [multi_probe](struct.LSH.html#method.multi_probe)
//! * [increase_storage](struct.LSH.html#method.increase_storage)
//!
//! ## BLAS support
//! Utilizing [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) will heavily increase
//! performance. To make use of BLAS, install `lsh-rs` w/ `"blas"` feature and reinstall `ndarray` w/ `"blas"` support.
//!  <br>
//!  <br>
//! **Cargo.toml:**
//! ```toml
//! lsh-rs = {version ="x.x"}, features=["blas"]}
//! ndarray = {version = "0.13", features=["blas"]}
//! # Or any other blas backend.
//! blas-src = { version = "0.6", defeault-features = false, features = ["openblas"]}
//! ```
//! ## Backends
//! The [LSH struct](struct.LSH.html) is exposed with multiple backends that store the hashes.
//! * in memory (fastest / can save state with serialization) [LshMem](type.LshMem.html)
//! * SQLite (slower due to disk io, but automatic state preservation between sessions) [LshSql](type.LshSql.html)
//! * in memory SQLite (can backup to SQLite when processing is done) [LshSqlMem](type.LshSqlMem.html)
#![allow(dead_code, non_snake_case)]
#[cfg(feature = "blas")]
extern crate blas_src;
extern crate crossbeam;
extern crate ndarray;
mod hash;
mod lsh {
    pub mod lsh;
    mod test;
}
pub mod dist;
mod multi_probe;
mod table {
    pub mod general;
    pub mod mem;
    pub mod sqlite;
    pub mod sqlite_mem;
}
mod constants;
mod error;
pub mod utils;
pub use crate::lsh::lsh::{LshMem, LshSql, LshSqlMem, LSH};
pub use hash::{Hash, HashPrimitive, SignRandomProjections, VecHash, L2, MIPS};
pub use table::{general::HashTables, mem::MemoryTable, sqlite::SqlTable, sqlite_mem::SqlTableMem};
pub mod stats;

pub type FloatSize = f32;
pub type DataPoint = Vec<f32>;
pub type DataPointSlice = [f32];
pub use error::Error;
pub type Result<T> = std::result::Result<T, Error>;