hnsw_vector_search 0.1.0

High-performance semantic search using HNSW and ONNX embeddings
Documentation

HNSW Vector Search ๐Ÿš€

A high-performance semantic search engine using Hierarchical Navigable Small World (HNSW) graphs and ONNX transformer embeddings.

Rust License

Features

  • โšก Fast search: O(log N) approximate nearest neighbor search
  • ๐ŸŽฏ High recall: 95%+ recall@10 with optimized parameters
  • ๐Ÿš€ SIMD optimized: 3-4x speedup using ARM NEON on Apple Silicon
  • ๐Ÿค– Real embeddings: Integration with ONNX Runtime for transformer models
  • ๐Ÿ’พ Persistent storage: Memory-mapped indices for instant loading
  • ๐Ÿ› ๏ธ CLI tool: Easy-to-use command-line interface
  • ๐Ÿ“Š Production-ready: Comprehensive benchmarks and documentation

Quick Start

Installation

# Clone the repository
git clone https://github.com/reemabdullah/hnsw-vector-search
cd hnsw-vector-search

# Build the project
cargo build --release

# The CLI tool will be at target/release/hnsw

Download Models

mkdir -p models
cd models

# Download ONNX model (all-MiniLM-L6-v2)
curl -L -o all-MiniLM-L6-v2.onnx \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"

# Download tokenizer
curl -L -o tokenizer.json \
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"

cd ..

Build an Index

Create a file documents.txt with one document per line:

Rust is a systems programming language focused on safety
Python is popular for data science and machine learning
JavaScript runs in web browsers for interactive websites

Build the index:

./target/release/hnsw build \
  --input documents.txt \
  --output my_index.bin \
  --m 16 \
  --ef-construction 200

Search

Interactive search:

./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt

Single query:

./target/release/hnsw search \
  --index my_index.bin \
  --docs my_index_docs.txt \
  --query "programming languages" \
  --k 5

Library Usage

Add to your Cargo.toml:

[dependencies]
hnsw_vector_search = "0.1.0"

Example code:

use hnsw_vector_search::{HnswGraph, OnnxEmbedder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load embedding model
    let mut embedder = OnnxEmbedder::new(
        "models/all-MiniLM-L6-v2.onnx",
        "models/tokenizer.json"
    )?;

    // Create index
    let mut index = HnswGraph::new(16, 200);

    // Add documents
    let documents = vec![
        "Rust is fast and memory-safe",
        "Python is great for ML",
        "JavaScript powers the web",
    ];

    for doc in &documents {
        let embedding = embedder.embed(doc)?;
        index.insert(embedding);
    }

    // Search
    let query = embedder.embed("programming languages")?;
    let results = index.search(&query, 3, 50);

    for (i, (doc_id, distance)) in results.iter().enumerate() {
        println!("{}. {} (distance: {:.3})",
                 i + 1, documents[*doc_id], distance);
    }

    Ok(())
}

Performance

Benchmarks on Apple M4 Pro (1000 vectors, 384 dimensions):

Operation Time Throughput
Search (k=10, ef=50) 12 ยตs 80K queries/sec
Insertion 90 ยตs 11K inserts/sec
Embedding 18 ms 55 embeds/sec
Index load (mmap) 2 ms -

SIMD Speedup

ARM NEON vs scalar on Apple Silicon:

Operation Fallback NEON Speedup
Dot product (384-dim) 45 ns 12 ns 3.75x
Normalize 120 ns 35 ns 3.43x
Euclidean distance 68 ns 18 ns 3.78x

Algorithm

HNSW creates a multi-layer graph:

Layer 2:  A โ† sparse, for long-range navigation
          |
Layer 1:  A-B-C โ† medium density
          |/|/|
Layer 0:  A-B-C-D-E-F โ† dense, all vectors

Search proceeds top-down:

  1. Start at entry point in highest layer
  2. Greedily navigate to nearest neighbor
  3. Drop to next layer
  4. Repeat until layer 0
  5. Return k nearest neighbors

Time Complexity: O(log N) expected

Parameters

Build Parameters

  • M (default: 16): Max connections per node

    • 8-12: Fast build, less memory, ~90% recall
    • 16-24: Balanced (recommended)
    • 32-48: High recall (~98%), more memory
  • ef_construction (default: 200): Build quality

    • 100: Fast build, decent quality
    • 200: Balanced (recommended)
    • 400: Slow build, excellent quality

Search Parameters

  • k: Number of results
  • ef (default: 50): Search quality
    • ef = k: Fast, may miss results
    • ef = 2ร—k: Balanced (recommended)
    • ef = 4ร—k: High recall, slower

CLI Commands

Build

hnsw build [OPTIONS]

Options:
  -i, --input <FILE>              Input documents (one per line)
  -o, --output <FILE>             Output index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
      --m <NUM>                   Max connections (default: 16)
      --ef-construction <NUM>     Build quality (default: 200)

Search

hnsw search [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -m, --model <FILE>              ONNX model path
  -t, --tokenizer <FILE>          Tokenizer path
  -d, --docs <FILE>               Documents file
  -k <NUM>                        Number of results (default: 10)
      --ef <NUM>                  Search quality (default: 50)
  -q, --query <TEXT>              Query (omit for interactive mode)

Info

hnsw info --index <FILE>

Benchmark

hnsw bench [OPTIONS]

Options:
  -i, --index <FILE>              Index file
  -n, --num-queries <NUM>         Number of queries (default: 1000)

Project Structure

hnsw-vector-search/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ simd/           # SIMD vector operations
โ”‚   โ”‚   โ”œโ”€โ”€ neon.rs     # ARM NEON intrinsics
โ”‚   โ”‚   โ””โ”€โ”€ fallback.rs # Portable fallback
โ”‚   โ”œโ”€โ”€ hnsw/           # HNSW algorithm
โ”‚   โ”‚   โ”œโ”€โ”€ graph.rs    # Graph structure
โ”‚   โ”‚   โ”œโ”€โ”€ search.rs   # Search algorithm
โ”‚   โ”‚   โ””โ”€โ”€ insert.rs   # Insertion algorithm
โ”‚   โ”œโ”€โ”€ embeddings/     # ONNX integration
โ”‚   โ”œโ”€โ”€ persistence/    # Serialization
โ”‚   โ””โ”€โ”€ cli/            # Command-line tool
โ”œโ”€โ”€ benches/            # Criterion benchmarks
โ””โ”€โ”€ examples/           # Usage examples

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Run cargo test and cargo clippy
  5. Submit a pull request

License

MIT License - see LICENSE file for details

References

Acknowledgments

Built with โค๏ธ using Rust and inspired by production vector databases like Pinecone, Weaviate, and Qdrant.