diskann-rs 0.5.0

A Rust implementation of DiskANN (Disk-based Approximate Nearest Neighbor search) using the Vamana graph algorithm. Provides memory-efficient vector search through graph traversal and memory-mapped storage, enabling billion-scale search with minimal RAM usage.
Documentation

DiskANN Implementation in Rust

Latest Version docs.rs

A Rust implementation of DiskANN (Disk-based Approximate Nearest Neighbor search) using the Vamana graph algorithm. This project provides an efficient and scalable solution for large-scale vector similarity search with minimal memory footprint.

When to Use diskann-rs

Use diskann-rs when... Use in-memory indexes (hnsw_rs) when...
Index is larger than available RAM Index fits comfortably in RAM
You need incremental updates without rebuilding Build time is critical (one-time cost)
Memory-constrained environments (containers, edge) Maximum recall needed (98%+)
Multiple large indexes on same machine Single index, dedicated resources
Cost-sensitive deployments Latency-critical applications

TL;DR: diskann-rs trades ~60% slower build time for 6-10x lower memory usage and 15x faster incremental updates.

Key Features

Feature Description
Incremental Updates Add/delete vectors without rebuilding the entire index
Filtered Search Query with metadata predicates (e.g., category filters)
Composable Features Combine incremental + filtered + quantized on a single index
SIMD Acceleration Optimized distance calculations (AVX2, SSE4.1, NEON)
Product Quantization Compress vectors up to 64x with PQ encoding
Scalar Quantization F16 (2x) and Int8 (4x) compression with SIMD-accelerated distance
Memory-Mapped I/O Single-file storage with minimal RAM footprint
Byte Serialization Load indexes from bytes (network, embedded, no filesystem)
Benchmark Formats Read/write fvecs, ivecs, bvecs (standard ANN benchmark formats)
Parallel Processing Concurrent index building and batch queries

Quick Start

Basic Index Operations

use anndists::dist::DistL2;
use diskann_rs::{DiskANN, DiskAnnParams};

// Build index
let vectors: Vec<Vec<f32>> = vec![vec![0.1, 0.2, 0.3], vec![0.4, 0.5, 0.6]];
let index = DiskANN::<DistL2>::build_index_default(&vectors, DistL2 {}, "index.db")?;

// Search
let query = vec![0.1, 0.2, 0.4];
let neighbors: Vec<u32> = index.search(&query, 10, 256);

Incremental Updates (No Rebuild Required)

use anndists::dist::DistL2;
use diskann_rs::IncrementalDiskANN;

// Build initial index
let vectors = vec![vec![0.0; 128]; 1000];
let index = IncrementalDiskANN::<DistL2>::build_default(&vectors, "index.db")?;

// Add new vectors without rebuilding
let new_vectors = vec![vec![1.0; 128]; 100];
index.add_vectors(&new_vectors)?;

// Delete vectors (instant tombstoning)
index.delete_vectors(&[0, 1, 2])?;

// Compact when needed (merges delta layer)
if index.should_compact() {
    index.compact("index_v2.db")?;
}

Filtered Search (Metadata Predicates)

use anndists::dist::DistL2;
use diskann_rs::{FilteredDiskANN, Filter};

// Build with labels (e.g., category IDs)
let vectors = vec![vec![0.0; 128]; 1000];
let labels: Vec<Vec<u64>> = (0..1000).map(|i| vec![i % 10]).collect(); // 10 categories

let index = FilteredDiskANN::<DistL2>::build(&vectors, &labels, "filtered.db")?;

// Search only category 5
let filter = Filter::label_eq(0, 5);
let results = index.search_filtered(&query, 10, 128, &filter);

// Complex filters
let filter = Filter::and(vec![
    Filter::label_eq(0, 5),           // category == 5
    Filter::label_range(1, 10, 100),  // price in [10, 100]
]);

Composable Incremental Index (Filtered + Quantized + Incremental)

use anndists::dist::DistL2;
use diskann_rs::{IncrementalDiskANN, IncrementalQuantizedConfig, QuantizerKind, Filter};

// Build an incremental index with labels and F16 quantization
let vectors = vec![vec![0.0; 128]; 1000];
let labels: Vec<Vec<u64>> = (0..1000).map(|i| vec![i % 5]).collect();
let quant_config = IncrementalQuantizedConfig { rerank_size: 50 };

let index = IncrementalDiskANN::<DistL2>::build_full(
    &vectors, &labels, "composable.db",
    Default::default(),              // IncrementalConfig
    QuantizerKind::F16,
    quant_config,
)?;

// Filtered search on the incremental index
let filter = Filter::label_eq(0, 3);
let results = index.search_filtered(&query, 10, 128, &filter);

// Add labeled vectors without rebuilding
let new_vecs = vec![vec![1.0; 128]; 50];
let new_labels = vec![vec![2u64]; 50];
index.add_vectors_with_labels(&new_vecs, &new_labels)?;

// Delete, compact, serialize — all features compose
index.delete_vectors(&[0, 1, 2])?;
let bytes = index.to_bytes();

Product Quantization (64x Compression)

use diskann_rs::pq::{ProductQuantizer, PQConfig};

// Train quantizer
let config = PQConfig {
    num_subspaces: 8,      // M = 8 segments
    num_centroids: 256,    // K = 256 codes per segment
    ..Default::default()
};
let pq = ProductQuantizer::train(&vectors, config)?;

// Encode vectors (128-dim f32 -> 8 bytes)
let codes: Vec<Vec<u8>> = pq.encode_batch(&vectors);

// Fast approximate distance using lookup table
let table = pq.create_distance_table(&query);
let dist = pq.distance_with_table(&table, &codes[0]);

SIMD-Accelerated Distance

use diskann_rs::{SimdL2, DiskANN, simd_info};

// Check available SIMD features
println!("{}", simd_info()); // "SIMD: NEON" or "SIMD: AVX2, SSE4.1"

// Use SIMD-optimized L2 distance
let index = DiskANN::<SimdL2>::build_index_default(&vectors, SimdL2, "index.db")?;

// Or use SIMD directly
use diskann_rs::simd::{l2_squared, dot_product, cosine_distance};
let dist = l2_squared(&vec_a, &vec_b);

Scalar Quantization (F16 / Int8)

use diskann_rs::{F16Quantizer, Int8Quantizer, VectorQuantizer};

// F16: 2x compression, nearly lossless
let f16q = F16Quantizer::new(128);
let codes = f16q.encode(&vector);           // 128 dims -> 256 bytes
let decoded = f16q.decode(&codes);          // Back to f32
let dist = f16q.asymmetric_distance(&query, &codes);  // SIMD-accelerated

// Int8: 4x compression, trained per-dimension scaling
let int8q = Int8Quantizer::train(&training_vectors)?;
let codes = int8q.encode(&vector);          // 128 dims -> 128 bytes
let dist = int8q.asymmetric_distance(&query, &codes);

// All quantizers implement VectorQuantizer trait
fn search_with_quantizer(q: &dyn VectorQuantizer, query: &[f32], codes: &[u8]) -> f32 {
    q.asymmetric_distance(query, codes)
}

Byte Loading (No Filesystem Required)

use anndists::dist::DistL2;
use diskann_rs::DiskANN;
use std::sync::Arc;

// Build and serialize to bytes
let index = DiskANN::<DistL2>::build_index_default(&vectors, DistL2{}, "index.db")?;
let bytes: Vec<u8> = index.to_bytes();

// Load from owned bytes (e.g., downloaded from network)
let index = DiskANN::<DistL2>::from_bytes(bytes, DistL2{})?;

// Load from shared bytes (multi-reader, zero-copy)
let shared: Arc<[u8]> = load_from_somewhere().into();
let index = DiskANN::<DistL2>::from_shared_bytes(shared, DistL2{})?;

// Works for all index types
let filtered_bytes = filtered_index.to_bytes();
let incremental_bytes = incremental_index.to_bytes();

Benchmark Format Support (fvecs/ivecs/bvecs)

use diskann_rs::formats::{read_fvecs, write_fvecs, read_ivecs, read_bvecs_as_f32};

// Load standard ANN benchmark datasets (SIFT, GIST, GloVe, etc.)
let base_vectors = read_fvecs("sift_base.fvecs")?;      // Vec<Vec<f32>>
let ground_truth = read_ivecs("sift_groundtruth.ivecs")?; // Vec<Vec<i32>>
let queries = read_fvecs("sift_query.fvecs")?;

// Load byte vectors as normalized floats
let mnist = read_bvecs_as_f32("mnist.bvecs")?;  // u8 [0,255] -> f32 [0,1]

// Save your own vectors
write_fvecs("my_vectors.fvecs", &vectors)?;

Performance

Why diskann-rs? Memory-Mapped I/O

Unlike in-memory indexes that require loading the entire graph into RAM, diskann-rs uses memory-mapped files. The OS loads only the pages you access, making it ideal for large-scale deployments:

Workload diskann-rs hnsw_rs Savings
Light (10 queries) 90 MB 896 MB 10x less RAM
Medium (100 queries) 136 MB 896 MB 6.6x less RAM
Heavy (1K queries) 147 MB 896 MB 6x less RAM
Stress (5K queries) 139 MB 896 MB 6.4x less RAM

Tested with 200K vectors, 128 dimensions. hnsw_rs must hold the full index in RAM; diskann-rs loads pages on-demand.

Benchmark Comparisons (vs hnsw_rs)

Metric diskann-rs hnsw_rs Winner
QPS at 93% recall 586 170 diskann-rs (3.4x)
Add vectors 31,000 vec/s 2,000 vec/s diskann-rs (15x)
Delete vectors Instant (tombstone) Full rebuild diskann-rs
Build time 33s / 100K 21s / 100K hnsw_rs (1.6x)
Max recall 97.5% 99.2% hnsw_rs

Dataset Benchmarks

Benchmarks on Apple M1 Max:

Dataset Vectors Build Time QPS Recall@10
SIFT-1M 1,000,000 295s 8,590 99.6%
Fashion-MNIST 60,000 111s 18,000 98.8%
Random-50K 50,000 38s 2,200 85.6%

Memory Efficiency

  • ~330MB RAM for 2GB index (16% of file size)
  • Product Quantization: 64x compression (512 bytes → 8 bytes per vector)
  • Int8 Quantization: 4x compression (512 bytes → 128 bytes per vector)
  • F16 Quantization: 2x compression (512 bytes → 256 bytes per vector)

Quantization Trade-offs

Method Compression Recall@10 Use Case
None (f32) 1x 100% Maximum accuracy
F16 2x 100% General purpose, nearly lossless
Int8 4x 99% Memory-constrained, high accuracy
PQ-32 16x ~62% Large-scale, re-ranking
PQ-8 64x ~12% Massive scale, coarse filtering

Architecture

Storage Abstraction

┌──────────────────────────────────────────────────────────────┐
│                        Storage                                │
├──────────────────────────────────────────────────────────────┤
│  Mmap(Mmap)        - Memory-mapped file (default, lazy I/O)  │
│  Owned(Vec<u8>)    - Owned bytes (network, embedded)         │
│  Shared(Arc<[u8]>) - Reference-counted (multi-reader)        │
└──────────────────────────────────────────────────────────────┘

All variants deref to &[u8], so search logic is unified.

File Layout

[ metadata_len:u64 ][ metadata (bincode) ][ padding to 1 MiB ]
[ vectors (n × dim × f32) ][ adjacency (n × max_degree × u32) ]

Quantization (Composable)

┌─────────────────────────────────────────────────────────────┐
│                   VectorQuantizer trait                      │
├─────────────────────────────────────────────────────────────┤
│  encode(&[f32]) -> Vec<u8>     - Compress vector            │
│  decode(&[u8]) -> Vec<f32>     - Decompress vector          │
│  asymmetric_distance(q, c)     - Query vs compressed        │
│  compression_ratio(dim)        - Bytes saved                │
├─────────────────────────────────────────────────────────────┤
│  Implementations:                                            │
│    F16Quantizer      - 2x compression, ~lossless            │
│    Int8Quantizer     - 4x compression, trained scaling      │
│    ProductQuantizer  - 64x compression, codebook-based      │
└─────────────────────────────────────────────────────────────┘

Incremental Updates (Composable Delta Layer)

┌──────────────────────────────────────────────────────────────┐
│                   IncrementalDiskANN                          │
├──────────────────────────────────────────────────────────────┤
│  ┌───────────────┐  ┌───────────────┐  ┌──────────────────┐  │
│  │  Base Index   │  │  Delta Layer  │  │   Tombstones     │  │
│  │  (mmap file)  │  │  (in-memory)  │  │   (HashSet)      │  │
│  │  + labels?    │  │  + mini-graph │  │                  │  │
│  │  + codes?     │  │  + labels?    │  │                  │  │
│  └───────┬───────┘  └───────┬───────┘  └──────────────────┘  │
│          └──────────┬───────┘                                │
│              UnifiedView (GraphIndex trait)                   │
│          multi-seed beam search over both                    │
├──────────────────────────────────────────────────────────────┤
│  Optional:  Labels → search_filtered()                       │
│             Quantizer → quantized distance (base codes)      │
│             Both → filtered + quantized search               │
└──────────────────────────────────────────────────────────────┘

Parameters

Build Parameters

Parameter Default Description
max_degree 64 Maximum neighbors per node (32-64)
build_beam_width 128 Construction beam width (128-256)
alpha 1.2 Pruning diversity factor (1.2-2.0)

Search Parameters

Parameter Typical Trade-off
beam_width 128-512 Higher = better recall, slower
k 10-100 Number of neighbors to return

Building and Testing

# Build
cargo build --release

# Run tests
cargo test --lib

# Run benchmarks
cargo bench --bench benchmark

# Large benchmarks (slower)
DISKANN_BENCH_LARGE=1 cargo bench --bench benchmark

Comparison with Other Libraries

vs rust-diskann

Feature diskann-rs rust-diskann
Incremental updates Yes No
Filtered search Yes No
Composable features Yes (incremental + filtered + quantized) No
Product Quantization Yes (64x) No
Scalar Quantization Yes (F16 2x, Int8 4x) No
Byte loading (no files) Yes No
Benchmark formats Yes (fvecs/ivecs/bvecs) No
SIMD acceleration Yes Uses anndists
Memory-mapped I/O Yes Yes
Generic vector types f32 f32, u64, etc.

vs hnsw_rs

Feature diskann-rs hnsw_rs
Memory usage 6-10x lower Full index in RAM
Incremental add 15x faster Slower
Incremental delete Instant (tombstone) Full rebuild
Build time 1.6x slower Faster
Max recall ~97% ~99%
Disk-based Yes (mmap) No (in-memory only)

vs Faiss (C++/Python)

Feature diskann-rs Faiss
Language Pure Rust C++/Python
Deployment Single binary Complex deps
Memory-mapped Native Limited
GPU support No Yes
Index types Vamana graph IVF, HNSW, PQ, etc.
Maturity Growing Production-proven

When to choose diskann-rs: Pure Rust deployment, memory-constrained environments, need for incremental updates.

When to choose Faiss: GPU acceleration needed, need specialized index types, Python ecosystem.

License

MIT License - see LICENSE for details.

References