crvecdb
A fast vector database library with HNSW indexing for Rust.
Features
- HNSW Indexing - Hierarchical Navigable Small World graphs for fast approximate nearest neighbor search
- Multiple Distance Metrics - Cosine, Euclidean (L2), Dot Product
- SIMD Acceleration - Cross-platform support for ARM NEON and x86 SSE/AVX2
- Memory-Mapped Storage - Persistent indexes with automatic memory mapping
- Parallel Operations - Rayon-powered parallel insert and search
Installation
[]
= "0.1"
Quick Start
use ;
// Create an in-memory index
let index = builder // 128 dimensions
.metric
.m // HNSW connections per node
.ef_construction // Build-time search width
.capacity
.build
.unwrap;
// Insert vectors
index.insert.unwrap;
index.insert.unwrap;
// Search for nearest neighbors
let results = index.search.unwrap;
for result in results
Parallel Bulk Insert
use ;
let index = builder
.metric
.capacity
.build
.unwrap;
// Prepare batch
let vectors: =
.map
.collect;
// Parallel insert - uses all CPU cores
index.insert_parallel.unwrap;
Persistent Storage
use ;
// Create a memory-mapped index
let index = builder
.metric
.capacity
.build_mmap
.unwrap;
// Data persists automatically
index.insert.unwrap;
index.flush.unwrap; // Saves both vectors and HNSW graph
// Reopen later
let index = open_mmap.unwrap;
// Graph is restored - no rebuild needed!
Distance Metrics
| Metric | Description | Use Case |
|---|---|---|
Cosine |
Normalized angular distance | Text embeddings, semantic search |
Euclidean |
L2 distance | Image features, spatial data |
DotProduct |
Inner product | Recommendation systems |
HNSW Parameters
| Parameter | Default | Description |
|---|---|---|
m |
16 | Max connections per node. Higher = better recall, more memory |
ef_construction |
200 | Search width during build. Higher = better graph, slower insert |
ef_search |
50 | Search width at query time. Higher = better recall, slower search |
Feature Flags
[]
= ["simd", "parallel"]
= ["simdeez"] # SIMD acceleration
= ["rayon"] # Parallel insert and search
The parallel feature enables multi-threaded operations:
insert_parallel()uses all CPU cores for bulk loading- Search benchmarks run queries in parallel
Disable for single-threaded builds:
[]
= { = "0.1", = false, = ["simd"] }
Performance
SIFT1M benchmark (1M vectors, 128 dimensions, Euclidean distance):
| Operation | Throughput | Notes |
|---|---|---|
| Parallel Insert | 4,000 vectors/sec | m=16, ef_construction=200 |
| Parallel Search (k=10) | 4,000 QPS | 97% recall@10 |
| Single Query Latency | ~1ms p50 |
Benchmarks
SIFT1M Benchmark
Download the dataset (not included in repo):
Run the benchmark:
Expected output:
=== SIFT1M Benchmark ===
[1/4] Loading dataset...
Base vectors: 1000000 x 128
Query vectors: 10000 x 128
Ground truth: 10000 x 100
[2/4] Building index (parallel)...
Build time: ~4 minutes
Vectors/sec: ~4000
[3/4] Benchmarking search (parallel)...
Recall@1 96.7% | QPS: ~4000
Recall@10 97.1% | QPS: ~4000
Recall@100 94.0% | QPS: ~4000
[4/4] Latency distribution (k=10, single-threaded)...
Avg: ~1.0 ms
P50: ~1.0 ms
P95: ~1.5 ms
P99: ~1.7 ms
License
MIT OR Apache-2.0