[][src]Crate lsh_rs

lsh-rs (Locality Sensitive Hashing)

Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.

For more information on the subject see:

Implementations

  • Base LSH
    • Signed Random Projections (Cosine similarity)
    • L2 distance
    • Maximum Inner Product (Dot products)
  • Multi Probe LSH
    • Step wise probing
      • SRP
      • L2
      • MIPS
    • Query directed probing
      • L2

Getting started

use lsh_rs::LshMem;
// 2 rows w/ dimension 3.
let p = &[vec![1., 1.5, 2.],
        vec![2., 1.1, -0.3]];

// Do one time expensive preprocessing.
let n_projections = 9;
let n_hash_tables = 30;
let dim = 3;
let mut lsh = LshMem::new(n_projections, n_hash_tables, dim).srp();
lsh.store_vecs(p);

// Query in sublinear time.
let query = &[1.1, 1.2, 1.2];
lsh.query_bucket(query);

Signed Random Projections

LSH for maximum cosine similarity search.

use lsh_rs::LshSql;
let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).srp();

L2

LSH for minimal L2 distance search.

// hyper parameter r in https://arxiv.org/pdf/1411.3787.pdf (eq. 8)
use lsh_rs::LshSql;
let bucket_width = 2.2;
let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).l2(bucket_width);

Maximum Inner Product (MIPS)

LSH for maximum inner product search.

use lsh_rs::LshSql;
let bucket_width = 2.2;
// l2(x) < U < 1.0
let U = 0.83;
// number of concatenations
let m = 3;
let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).mips(r, U, m);

Seed

Random projections are used to generate the hash functions. The default seeding of randomness is taken from the system. If you want to have reproducable outcomes, you can set a manual seed.

use lsh_rs::LshSql;
let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).seed(12).srp();

Unique indexes

Instead of storing data points as vectors. Storing L copies of the data points (one in every hash table). You can choose to only store unique indexes of the data points. The index ids are assigned in chronological order. This will drastically decrease the required memory.

use lsh_rs::LshSql;
let mut lsh = LshSql::new(n_projections, n_hash_tables, dim).only_index().srp();

Builder pattern methods

The following methods can be used to change internal state during object initialization:

BLAS support

Utilizing BLAS will heavily increase performance. To make use of BLAS, install lsh-rs w/ "blas" feature and reinstall ndarray w/ "blas" support.

Cargo.toml:

lsh-rs = {version ="x.x"}, features=["blas"]}
ndarray = {version = "0.13", features=["blas"]}
# Or any other blas backend.
blas-src = { version = "0.6", defeault-features = false, features = ["openblas"]}

Backends

The LSH struct is exposed with multiple backends that store the hashes.

  • in memory (fastest / can save state with serialization) LshMem
  • SQLite (slower due to disk io, but automatic state preservation between sessions) LshSql
  • in memory SQLite (can backup to SQLite when processing is done) LshSqlMem

Modules

dist
stats
utils

Structs

L2

L2 Hasher family. Read more.

LSH

Wrapper for LSH functionality. Can be initialized following the Builder pattern.

MIPS

Maximum Inner Product Search. Read more.

MemoryTable

In memory backend for LSH.

SignRandomProjections

Also called SimHash. A family of hashers for the cosine similarity.

SqlTable

Sqlite backend for LSH.

SqlTableMem

In memory Sqlite backend for LSH.

Enums

Error

Traits

HashTables

Hashtable consisting of L Hash tables.

VecHash

Type Definitions

DataPoint
DataPointSlice
FloatSize
Hash
HashPrimitive
LshMem
LshSql
LshSqlMem
Result