vector-index 0.1.0

Generic HNSW vector index with pluggable distance metrics.
Documentation
# vector-index

[![Crates.io](https://img.shields.io/crates/v/vector-index.svg)](https://crates.io/crates/vector-index)
[![Documentation](https://docs.rs/vector-index/badge.svg)](https://docs.rs/vector-index)
[![License](https://img.shields.io/crates/l/vector-index.svg)](https://github.com/elli0t-yash/vector-index#license)

Approximate nearest neighbor search in Rust, with pluggable distance metrics.

## What it is

An HNSW (Hierarchical Navigable Small World) index that's generic over the distance metric. Plug in L2, cosine, or any custom distance function — the index doesn't care.

## Quickstart

Add to `Cargo.toml`:

```toml
[dependencies]
vector-index = "0.1"
```

Use it:

```rust
use vector_index::{HnswIndex, HnswConfig, L2};

// Build an index with the L2 metric
let mut index = HnswIndex::new(L2, HnswConfig::default());

// Insert some vectors
index.insert(vec![1.0, 0.0, 0.0]).unwrap();
index.insert(vec![0.0, 1.0, 0.0]).unwrap();
index.insert(vec![0.0, 0.0, 1.0]).unwrap();

// Search for the 2 nearest neighbors
let query = vec![0.9, 0.1, 0.0];
let neighbors = index.search(&query, 2);

for n in &neighbors {
    println!("id={} distance={}", n.id, n.distance);
}
```

## Why this crate

Most Rust ANN crates hardcode their distance function — usually L2 or cosine. If you want to use a different metric (Sliced-Wasserstein, custom Hamming, learned distances, anything), you have to either fork the crate or build the index yourself.

This crate solves that. The `Metric` trait is the only API the index needs to know about your distance function:

```rust
pub trait Metric {
    type Point;
    fn distance(&self, a: &Self::Point, b: &Self::Point) -> f32;
    fn dim(&self, point: &Self::Point) -> usize;
}
```

Implement it for your type, plug it into `HnswIndex<P, M>`, and search.

## Built-in metrics

- `L2` — squared Euclidean distance, suitable for embeddings
- `Cosine` — cosine distance, suitable for direction-sensitive comparisons

For Sliced-Wasserstein over empirical distributions (point clouds), see the companion crate [`sliced-wasserstein`](https://crates.io/crates/sliced-wasserstein).

## Concurrent access

Use `ConcurrentHnsw` for multi-threaded reads with occasional writes:

```rust
use vector_index::{ConcurrentHnsw, HnswConfig, L2};
use std::sync::Arc;

let index = Arc::new(ConcurrentHnsw::new(L2, HnswConfig::default()));

// Multiple threads can search concurrently
let neighbors = index.search(&query, 10);
```

## Configuration

`HnswConfig` exposes the standard HNSW parameters:

| Parameter         | Default | Notes                                                          |
| ----------------- | ------- | -------------------------------------------------------------- |
| `m`               | 16      | Max neighbors per node. Higher = better recall, more memory.   |
| `m_max0`          | 32      | Max neighbors at level 0. Default is `2 * m`.                  |
| `ef_construction` | 200     | Build-time candidate pool. Higher = better recall, slower build. |
| `ef_search`       | 50      | Search-time candidate pool. Higher = better recall, slower search. |

For most workloads the defaults work well. Tune `ef_search` if you need different recall/latency tradeoffs.

## Status

`vector-index` is at 0.1.0. The core HNSW algorithm is implemented (Algorithm 4 neighbor selection from the original paper), tested with recall verified on synthetic Gaussian benchmarks, and used in production by [OmniPulse](https://github.com/elli0t-yash) for distributional fingerprint retrieval.

Currently does **not** support:

- Deletion (HNSW deletion is non-trivial; planned for 0.2.0)
- Persistence to disk (in-memory only)
- Serialization of the index structure

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE)
- MIT License ([LICENSE-MIT]LICENSE-MIT)

at your option.