vector-index 0.1.0

Generic HNSW vector index with pluggable distance metrics.
Documentation

vector-index

Crates.io Documentation License

Approximate nearest neighbor search in Rust, with pluggable distance metrics.

What it is

An HNSW (Hierarchical Navigable Small World) index that's generic over the distance metric. Plug in L2, cosine, or any custom distance function — the index doesn't care.

Quickstart

Add to Cargo.toml:

[dependencies]
vector-index = "0.1"

Use it:

use vector_index::{HnswIndex, HnswConfig, L2};

// Build an index with the L2 metric
let mut index = HnswIndex::new(L2, HnswConfig::default());

// Insert some vectors
index.insert(vec![1.0, 0.0, 0.0]).unwrap();
index.insert(vec![0.0, 1.0, 0.0]).unwrap();
index.insert(vec![0.0, 0.0, 1.0]).unwrap();

// Search for the 2 nearest neighbors
let query = vec![0.9, 0.1, 0.0];
let neighbors = index.search(&query, 2);

for n in &neighbors {
    println!("id={} distance={}", n.id, n.distance);
}

Why this crate

Most Rust ANN crates hardcode their distance function — usually L2 or cosine. If you want to use a different metric (Sliced-Wasserstein, custom Hamming, learned distances, anything), you have to either fork the crate or build the index yourself.

This crate solves that. The Metric trait is the only API the index needs to know about your distance function:

pub trait Metric {
    type Point;
    fn distance(&self, a: &Self::Point, b: &Self::Point) -> f32;
    fn dim(&self, point: &Self::Point) -> usize;
}

Implement it for your type, plug it into HnswIndex<P, M>, and search.

Built-in metrics

  • L2 — squared Euclidean distance, suitable for embeddings
  • Cosine — cosine distance, suitable for direction-sensitive comparisons

For Sliced-Wasserstein over empirical distributions (point clouds), see the companion crate sliced-wasserstein.

Concurrent access

Use ConcurrentHnsw for multi-threaded reads with occasional writes:

use vector_index::{ConcurrentHnsw, HnswConfig, L2};
use std::sync::Arc;

let index = Arc::new(ConcurrentHnsw::new(L2, HnswConfig::default()));

// Multiple threads can search concurrently
let neighbors = index.search(&query, 10);

Configuration

HnswConfig exposes the standard HNSW parameters:

Parameter Default Notes
m 16 Max neighbors per node. Higher = better recall, more memory.
m_max0 32 Max neighbors at level 0. Default is 2 * m.
ef_construction 200 Build-time candidate pool. Higher = better recall, slower build.
ef_search 50 Search-time candidate pool. Higher = better recall, slower search.

For most workloads the defaults work well. Tune ef_search if you need different recall/latency tradeoffs.

Status

vector-index is at 0.1.0. The core HNSW algorithm is implemented (Algorithm 4 neighbor selection from the original paper), tested with recall verified on synthetic Gaussian benchmarks, and used in production by OmniPulse for distributional fingerprint retrieval.

Currently does not support:

  • Deletion (HNSW deletion is non-trivial; planned for 0.2.0)
  • Persistence to disk (in-memory only)
  • Serialization of the index structure

License

Licensed under either of:

at your option.