hnsw_vector_search 0.1.0

# HNSW Vector Search 🚀

A high-performance semantic search engine using **Hierarchical Navigable Small World (HNSW)** graphs and **ONNX transformer embeddings**.

[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

## Features

- ⚡ **Fast search**: O(log N) approximate nearest neighbor search
- 🎯 **High recall**: 95%+ recall@10 with optimized parameters
- 🚀 **SIMD optimized**: 3-4x speedup using ARM NEON on Apple Silicon
- 🤖 **Real embeddings**: Integration with ONNX Runtime for transformer models
- 💾 **Persistent storage**: Memory-mapped indices for instant loading
- 🛠️ **CLI tool**: Easy-to-use command-line interface
- 📊 **Production-ready**: Comprehensive benchmarks and documentation

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/reemabdullah/hnsw-vector-search
cd hnsw-vector-search

# Build the project
cargo build --release

# The CLI tool will be at target/release/hnsw
```

### Download Models

```bash
mkdir -p models
cd models

# Download ONNX model (all-MiniLM-L6-v2)
curl -L -o all-MiniLM-L6-v2.onnx \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"

# Download tokenizer
curl -L -o tokenizer.json \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"

cd ..
```

### Build an Index

Create a file `documents.txt` with one document per line:

```text
Rust is a systems programming language focused on safety
Python is popular for data science and machine learning
JavaScript runs in web browsers for interactive websites
```

Build the index:

```bash
./target/release/hnsw build \
  --input documents.txt \
  --output my_index.bin \
  --m 16 \
  --ef-construction 200
```

### Search

Interactive search:

```bash
./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt
```

Single query:

```bash
./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt \
  --query "programming languages" \
  --k 5
```

## Library Usage

Add to your `Cargo.toml`:

```toml
[dependencies]
hnsw_vector_search = "0.1.0"
```

Example code:

```rust
use hnsw_vector_search::{HnswGraph, OnnxEmbedder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load embedding model
    let mut embedder = OnnxEmbedder::new(
        "models/all-MiniLM-L6-v2.onnx",
        "models/tokenizer.json"
    )?;

    // Create index
    let mut index = HnswGraph::new(16, 200);

    // Add documents
    let documents = vec![
        "Rust is fast and memory-safe",
        "Python is great for ML",
        "JavaScript powers the web",
    ];

    for doc in &documents {
        let embedding = embedder.embed(doc)?;
        index.insert(embedding);
    }

    // Search
    let query = embedder.embed("programming languages")?;
    let results = index.search(&query, 3, 50);

    for (i, (doc_id, distance)) in results.iter().enumerate() {
        println!("{}. {} (distance: {:.3})",
                 i + 1, documents[*doc_id], distance);
    }

    Ok(())
}
```

## Performance

Benchmarks on Apple M4 Pro (1000 vectors, 384 dimensions):

| Operation            | Time  | Throughput      |
| -------------------- | ----- | --------------- |
| Search (k=10, ef=50) | 12 µs | 80K queries/sec |
| Insertion            | 90 µs | 11K inserts/sec |
| Embedding            | 18 ms | 55 embeds/sec   |
| Index load (mmap)    | 2 ms  | -               |

### SIMD Speedup

ARM NEON vs scalar on Apple Silicon:

| Operation             | Fallback | NEON  | Speedup |
| --------------------- | -------- | ----- | ------- |
| Dot product (384-dim) | 45 ns    | 12 ns | 3.75x   |
| Normalize             | 120 ns   | 35 ns | 3.43x   |
| Euclidean distance    | 68 ns    | 18 ns | 3.78x   |

## Algorithm

HNSW creates a multi-layer graph:

```
Layer 2:  A ← sparse, for long-range navigation
          |
Layer 1:  A-B-C ← medium density
          |/|/|
Layer 0:  A-B-C-D-E-F ← dense, all vectors
```

Search proceeds top-down:

1. Start at entry point in highest layer
2. Greedily navigate to nearest neighbor
3. Drop to next layer
4. Repeat until layer 0
5. Return k nearest neighbors

**Time Complexity**: O(log N) expected

## Parameters

### Build Parameters

- **M** (default: 16): Max connections per node
  - 8-12: Fast build, less memory, ~90% recall
  - 16-24: Balanced (recommended)
  - 32-48: High recall (~98%), more memory

- **ef_construction** (default: 200): Build quality
  - 100: Fast build, decent quality
  - 200: Balanced (recommended)
  - 400: Slow build, excellent quality

### Search Parameters

- **k**: Number of results
- **ef** (default: 50): Search quality
  - ef = k: Fast, may miss results
  - ef = 2×k: Balanced (recommended)
  - ef = 4×k: High recall, slower

## CLI Commands

### Build

```bash
hnsw build [OPTIONS]

Options:
  -i, --input <FILE>              Input documents (one per line)
  -o, --output <FILE>             Output index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
      --m <NUM>                   Max connections (default: 16)
      --ef-construction <NUM>     Build quality (default: 200)
```

### Search

```bash
hnsw search [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
  -d, --docs <FILE>               Documents file
  -k <NUM>                        Number of results (default: 10)
      --ef <NUM>                  Search quality (default: 50)
  -q, --query <TEXT>              Query (omit for interactive mode)
```

### Info

```bash
hnsw info --index <FILE>
```

### Benchmark

```bash
hnsw bench [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -n, --num-queries <NUM>         Number of queries (default: 1000)
```

## Project Structure

```
hnsw-vector-search/
├── src/
│   ├── simd/           # SIMD vector operations
│   │   ├── neon.rs     # ARM NEON intrinsics
│   │   └── fallback.rs # Portable fallback
│   ├── hnsw/           # HNSW algorithm
│   │   ├── graph.rs    # Graph structure
│   │   ├── search.rs   # Search algorithm
│   │   └── insert.rs   # Insertion algorithm
│   ├── embeddings/     # ONNX integration
│   ├── persistence/    # Serialization
│   └── cli/            # Command-line tool
├── benches/            # Criterion benchmarks
└── examples/           # Usage examples
```

## Contributing

Contributions welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Run `cargo test` and `cargo clippy`
5. Submit a pull request

## License

MIT License - see LICENSE file for details

## References

- [HNSW Paper](https://arxiv.org/abs/1603.09320) - Malkov & Yashunin (2018)
- [ONNX Runtime](https://onnxruntime.ai/)
- [Sentence Transformers](https://www.sbert.net/)

## Acknowledgments

Built with ❤️ using Rust and inspired by production vector databases like Pinecone, Weaviate, and Qdrant.