Module lsh

Module lsh 

Source
Expand description

Locality Sensitive Hashing (LSH) for Approximate Nearest Neighbor Search

LSH is an ANN algorithm that uses hash functions to map similar vectors to the same buckets with high probability. This implementation uses random projection LSH for cosine similarity.

§Features

  • Random Projection LSH: Hash functions based on random hyperplanes
  • Multi-table Hashing: Multiple hash tables for better recall
  • Multi-probe Search: Query nearby buckets to improve accuracy
  • Configurable Parameters: num_tables, num_bits, num_probes

§Algorithm

  1. Generate random projection vectors (hyperplanes)
  2. For each vector, compute hash by checking which side of hyperplanes it’s on
  3. Store vectors in hash buckets
  4. At query time, hash the query and retrieve candidates from matching buckets
  5. Optionally probe nearby buckets (flip hash bits) for better recall
  6. Rank candidates by actual similarity

§Example

use oxify_vector::lsh::{LshIndex, LshConfig};
use std::collections::HashMap;

// Create embeddings
let mut embeddings = HashMap::new();
for i in 0..1000 {
    let vec = vec![i as f32 * 0.01, (i * 2) as f32 * 0.01, (i * 3) as f32 * 0.01];
    embeddings.insert(format!("doc{}", i), vec);
}

// Build LSH index
let config = LshConfig::default();
let mut index = LshIndex::new(config);
index.build(&embeddings)?;

// Search
let query = vec![0.5, 1.0, 1.5];
let results = index.search(&query, 10)?;

Structs§

LshConfig
LSH configuration
LshIndex
LSH Index for approximate nearest neighbor search
LshStats
LSH index statistics