Expand description
Locality Sensitive Hashing (LSH) for Approximate Nearest Neighbor Search
LSH is an ANN algorithm that uses hash functions to map similar vectors to the same buckets with high probability. This implementation uses random projection LSH for cosine similarity.
§Features
- Random Projection LSH: Hash functions based on random hyperplanes
- Multi-table Hashing: Multiple hash tables for better recall
- Multi-probe Search: Query nearby buckets to improve accuracy
- Configurable Parameters: num_tables, num_bits, num_probes
§Algorithm
- Generate random projection vectors (hyperplanes)
- For each vector, compute hash by checking which side of hyperplanes it’s on
- Store vectors in hash buckets
- At query time, hash the query and retrieve candidates from matching buckets
- Optionally probe nearby buckets (flip hash bits) for better recall
- Rank candidates by actual similarity
§Example
use oxify_vector::lsh::{LshIndex, LshConfig};
use std::collections::HashMap;
// Create embeddings
let mut embeddings = HashMap::new();
for i in 0..1000 {
let vec = vec![i as f32 * 0.01, (i * 2) as f32 * 0.01, (i * 3) as f32 * 0.01];
embeddings.insert(format!("doc{}", i), vec);
}
// Build LSH index
let config = LshConfig::default();
let mut index = LshIndex::new(config);
index.build(&embeddings)?;
// Search
let query = vec![0.5, 1.0, 1.5];
let results = index.search(&query, 10)?;