Ruvector Core
The pure-Rust vector database engine behind RuVector -- HNSW indexing, quantization, and SIMD acceleration in a single crate.
ruvector-core is the foundational library that powers the entire RuVector ecosystem. It gives you a production-grade vector database you can embed directly into any Rust application: insert vectors, search them in under a millisecond, filter by metadata, and compress storage up to 32x -- all without external services. If you need vector search as a library instead of a server, this is the crate.
| ruvector-core | Typical Vector Database | |
|---|---|---|
| Deployment | Embed as a Rust dependency -- no server, no network calls | Run a separate service, manage connections |
| Query latency | <0.5 ms p50 at 1M vectors with HNSW | ~1-5 ms depending on network and index |
| Memory compression | Scalar (4x), Product (8-32x), Binary (32x) quantization built in | Often requires paid tiers or external tools |
| SIMD acceleration | SimSIMD hardware-optimized distance calculations, automatic | Manual tuning or not available |
| Search modes | Dense vectors, sparse BM25, hybrid, MMR diversity, filtered -- all in one API | Typically dense-only; hybrid and filtering are add-ons |
| Storage | Zero-copy mmap with redb -- instant loading, no deserialization |
Load time scales with dataset size |
| Concurrency | Lock-free indexing with parallel batch processing via Rayon | Varies; many require single-writer locks |
| Dependencies | Minimal -- pure Rust, compiles anywhere rustc runs |
Often depends on C/C++ libraries (BLAS, LAPACK) |
| Cost | Free forever -- open source (MIT) | Per-vector or per-query pricing on managed tiers |
Installation
Add ruvector-core to your Cargo.toml:
[]
= "0.1.0"
Feature Flags
[]
= { = "0.1.0", = ["simd", "uuid-support"] }
Available features:
simd(default): Enable SIMD-optimized distance calculationsuuid-support(default): Enable UUID generation for vector IDs
Key Features
| Feature | What It Does | Why It Matters |
|---|---|---|
| HNSW Indexing | Hierarchical Navigable Small World graphs for O(log n) approximate nearest neighbor search | Sub-millisecond queries at million-vector scale |
| Multiple Distance Metrics | Euclidean, Cosine, Dot Product, Manhattan | Match the metric to your embedding model without conversion |
| Scalar Quantization | Compress vectors to 8-bit integers (4x reduction) | Cut memory by 75% with 98% recall preserved |
| Product Quantization | Split vectors into subspaces with codebooks (8-32x reduction) | Store millions of vectors on a single machine |
| Binary Quantization | 1-bit representation (32x reduction) | Ultra-fast screening pass for massive datasets |
| SIMD Distance | Hardware-accelerated distance via SimSIMD | Up to 80K QPS on 8 cores without code changes |
| Zero-Copy I/O | Memory-mapped storage loads instantly | No deserialization step -- open a file and search immediately |
| Hybrid Search | Combine dense vector similarity with sparse BM25 text scoring | One query handles both semantic and keyword matching |
| Metadata Filtering | Apply key-value filters during search | No post-filtering needed -- results are already filtered |
| MMR Diversification | Maximal Marginal Relevance re-ranking | Avoid redundant results when top-K are too similar |
| Conformal Prediction | Uncertainty quantification on search results | Know when to trust (or distrust) a match |
| Lock-Free Indexing | Concurrent reads and writes without blocking | High-throughput ingestion while serving queries |
| Batch Processing | Parallel insert and search via Rayon | Saturate all cores for bulk operations |
Quick Start
Basic Usage
use ;
Batch Operations
use ;
// Insert multiple vectors efficiently
let entries = vec!;
let ids = db.insert_batch?;
println!;
With Metadata Filtering
use HashMap;
use json;
// Insert with metadata
db.insert?;
// Search with metadata filter
let results = db.search?;
HNSW Configuration
use ;
let mut options = default;
options.dimensions = 384;
options.distance_metric = Cosine;
// Configure HNSW index parameters
options.hnsw_config = Some;
let db = new?;
Quantization
use ;
let mut options = default;
options.dimensions = 384;
// Enable scalar quantization (4x compression)
options.quantization = Some;
// Or product quantization (8-32x compression)
options.quantization = Some;
let db = new?;
API Overview
Core Types
// Main database interface
// Vector entry with optional ID and metadata
// Search query parameters
// Search result with score
Main Operations
Distance Metrics
Advanced Features
// Hybrid search (dense + sparse)
use ;
let hybrid = new;
// Filtered search with expressions
use ;
let filtered = new;
let expr = And;
// MMR diversification
use ;
let mmr = new;
Performance
Latency (Single Query)
Operation Flat Index HNSW Index
---------------------------------------------
Search (1K vecs) ~0.1ms ~0.2ms
Search (100K vecs) ~10ms ~0.5ms
Search (1M vecs) ~100ms <1ms
Insert ~0.1ms ~1ms
Batch (1000) ~50ms ~500ms
Memory Usage (1M Vectors, 384 Dimensions)
Configuration Memory Recall
---------------------------------------------
Full Precision (f32) ~1.5GB 100%
Scalar Quantization ~400MB 98%
Product Quantization ~200MB 95%
Binary Quantization ~50MB 85%
Throughput (Queries Per Second)
Configuration QPS Latency (p50)
-----------------------------------------------------
Single Thread ~2,000 ~0.5ms
Multi-Thread (8 cores) ~50,000 <0.5ms
With SIMD ~80,000 <0.3ms
With Quantization ~100,000 <0.2ms
Configuration Guide
For Maximum Accuracy
let options = DbOptions ;
For Maximum Speed
let options = DbOptions ;
For Balanced Performance
let options = default; // Recommended defaults
Building and Testing
Build
# Build with default features
# Build without SIMD
# Build for specific target with optimizations
RUSTFLAGS="-C target-cpu=native"
Testing
# Run all tests
# Run with specific features
# Run with logging
RUST_LOG=debug
Benchmarks
# Run all benchmarks
# Run specific benchmark
# Run with features
Available benchmarks:
distance_metrics- SIMD-optimized distance calculationshnsw_search- HNSW index search performancequantization_bench- Quantization techniquesbatch_operations- Batch insert/search operationscomprehensive_bench- Full system benchmarks
Related Crates
ruvector-core is the foundation for platform-specific bindings:
- ruvector-node - Node.js bindings via NAPI-RS
- ruvector-wasm - WebAssembly bindings for browsers
- ruvector-gnn - Graph Neural Network layer for learned search
- ruvector-cli - Command-line interface
- ruvector-bench - Performance benchmarks
Documentation
- Main README - Complete project overview
- Getting Started Guide - Quick start tutorial
- Rust API Reference - Detailed API documentation
- Advanced Features Guide - Quantization, indexing, tuning
- Performance Tuning - Optimization strategies
- API Documentation - Full API reference on docs.rs
Acknowledgments
Built with state-of-the-art algorithms and libraries:
- hnsw_rs - HNSW implementation
- simsimd - SIMD distance calculations
- redb - Embedded database
- rayon - Data parallelism
- memmap2 - Memory-mapped files
License
MIT License - see LICENSE for details.