HNSW Vector Search 🚀

A high-performance semantic search engine using Hierarchical Navigable Small World (HNSW) graphs and ONNX transformer embeddings.

Features

⚡ Fast search: O(log N) approximate nearest neighbor search
🎯 High recall: 95%+ recall@10 with optimized parameters
🚀 SIMD optimized: 3-4x speedup using ARM NEON on Apple Silicon
🤖 Real embeddings: Integration with ONNX Runtime for transformer models
💾 Persistent storage: Memory-mapped indices for instant loading
🛠️ CLI tool: Easy-to-use command-line interface
📊 Production-ready: Comprehensive benchmarks and documentation

Quick Start

Installation

# Clone the repository
git clone https://github.com/reemabdullah/hnsw-vector-search
cd hnsw-vector-search

# Build the project
cargo build --release

# The CLI tool will be at target/release/hnsw

Download Models

mkdir -p models
cd models

# Download ONNX model (all-MiniLM-L6-v2)
curl -L -o all-MiniLM-L6-v2.onnx \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"

# Download tokenizer
curl -L -o tokenizer.json \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"

cd ..

Build an Index

Create a file documents.txt with one document per line:

Rust is a systems programming language focused on safety
Python is popular for data science and machine learning
JavaScript runs in web browsers for interactive websites

Build the index:

./target/release/hnsw build \
  --input documents.txt \
  --output my_index.bin \
  --m 16 \
  --ef-construction 200

Search

Interactive search:

./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt

Single query:

./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt \
  --query "programming languages" \
  --k 5

Library Usage

Add to your Cargo.toml:

[dependencies]
hnsw_vector_search = "0.1.0"

Example code:

use hnsw_vector_search::{HnswGraph, OnnxEmbedder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load embedding model
    let mut embedder = OnnxEmbedder::new(
        "models/all-MiniLM-L6-v2.onnx",
        "models/tokenizer.json"
    )?;

    // Create index
    let mut index = HnswGraph::new(16, 200);

    // Add documents
    let documents = vec![
        "Rust is fast and memory-safe",
        "Python is great for ML",
        "JavaScript powers the web",
    ];

    for doc in &documents {
        let embedding = embedder.embed(doc)?;
        index.insert(embedding);
    }

    // Search
    let query = embedder.embed("programming languages")?;
    let results = index.search(&query, 3, 50);

    for (i, (doc_id, distance)) in results.iter().enumerate() {
        println!("{}. {} (distance: {:.3})",
                 i + 1, documents[*doc_id], distance);
    }

    Ok(())
}

Performance

Benchmarks on Apple M4 Pro (1000 vectors, 384 dimensions):

Operation	Time	Throughput
Search (k=10, ef=50)	12 µs	80K queries/sec
Insertion	90 µs	11K inserts/sec
Embedding	18 ms	55 embeds/sec
Index load (mmap)	2 ms	-

SIMD Speedup

ARM NEON vs scalar on Apple Silicon:

Operation	Fallback	NEON	Speedup
Dot product (384-dim)	45 ns	12 ns	3.75x
Normalize	120 ns	35 ns	3.43x
Euclidean distance	68 ns	18 ns	3.78x

Algorithm

HNSW creates a multi-layer graph:

Layer 2:  A ← sparse, for long-range navigation
          |
Layer 1:  A-B-C ← medium density
          |/|/|
Layer 0:  A-B-C-D-E-F ← dense, all vectors

Search proceeds top-down:

Start at entry point in highest layer
Greedily navigate to nearest neighbor
Drop to next layer
Repeat until layer 0
Return k nearest neighbors

Time Complexity: O(log N) expected

Parameters

Build Parameters

M (default: 16): Max connections per node
- 8-12: Fast build, less memory, ~90% recall
- 16-24: Balanced (recommended)
- 32-48: High recall (~98%), more memory
ef_construction (default: 200): Build quality
- 100: Fast build, decent quality
- 200: Balanced (recommended)
- 400: Slow build, excellent quality

Search Parameters

k: Number of results
ef (default: 50): Search quality
- ef = k: Fast, may miss results
- ef = 2×k: Balanced (recommended)
- ef = 4×k: High recall, slower

CLI Commands

Build

hnsw build [OPTIONS]

Options:
  -i, --input <FILE>              Input documents (one per line)
  -o, --output <FILE>             Output index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
      --m <NUM>                   Max connections (default: 16)
      --ef-construction <NUM>     Build quality (default: 200)

Search

hnsw search [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
  -d, --docs <FILE>               Documents file
  -k <NUM>                        Number of results (default: 10)
      --ef <NUM>                  Search quality (default: 50)
  -q, --query <TEXT>              Query (omit for interactive mode)

Info

hnsw info --index <FILE>

Benchmark

hnsw bench [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -n, --num-queries <NUM>         Number of queries (default: 1000)

Project Structure

hnsw-vector-search/
├── src/
│   ├── simd/           # SIMD vector operations
│   │   ├── neon.rs     # ARM NEON intrinsics
│   │   └── fallback.rs # Portable fallback
│   ├── hnsw/           # HNSW algorithm
│   │   ├── graph.rs    # Graph structure
│   │   ├── search.rs   # Search algorithm
│   │   └── insert.rs   # Insertion algorithm
│   ├── embeddings/     # ONNX integration
│   ├── persistence/    # Serialization
│   └── cli/            # Command-line tool
├── benches/            # Criterion benchmarks
└── examples/           # Usage examples

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Run cargo test and cargo clippy
Submit a pull request

License

MIT License - see LICENSE file for details

References

HNSW Paper - Malkov & Yashunin (2018)
ONNX Runtime
Sentence Transformers

Acknowledgments

Built with ❤️ using Rust and inspired by production vector databases like Pinecone, Weaviate, and Qdrant.

hnsw_vector_search 0.1.0

HNSW Vector Search 🚀

Features

Quick Start

Installation

Download Models

Build an Index

Search

Library Usage

Performance

SIMD Speedup

Algorithm

Parameters

Build Parameters

Search Parameters

CLI Commands

Build

Search

Info

Benchmark

Project Structure

Contributing

License

References

Acknowledgments