ruvector-core 2.0.6

High-performance Rust vector database core with HNSW indexing
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
# Ruvector Core

[![Crates.io](https://img.shields.io/crates/v/ruvector-core.svg)](https://crates.io/crates/ruvector-core)
[![Documentation](https://docs.rs/ruvector-core/badge.svg)](https://docs.rs/ruvector-core)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Rust](https://img.shields.io/badge/rust-1.77%2B-orange.svg)](https://www.rust-lang.org)

**The pure-Rust vector database engine behind RuVector -- HNSW indexing, quantization, and SIMD acceleration in a single crate.**

`ruvector-core` is the foundational library that powers the entire [RuVector](https://github.com/ruvnet/ruvector) ecosystem. It gives you a production-grade vector database you can embed directly into any Rust application: insert vectors, search them in under a millisecond, filter by metadata, and compress storage up to 32x -- all without external services. If you need vector search as a library instead of a server, this is the crate.

| | ruvector-core | Typical Vector Database |
|---|---|---|
| **Deployment** | Embed as a Rust dependency -- no server, no network calls | Run a separate service, manage connections |
| **Query latency** | <0.5 ms p50 at 1M vectors with HNSW | ~1-5 ms depending on network and index |
| **Memory compression** | Scalar (4x), Product (8-32x), Binary (32x) quantization built in | Often requires paid tiers or external tools |
| **SIMD acceleration** | SimSIMD hardware-optimized distance calculations, automatic | Manual tuning or not available |
| **Search modes** | Dense vectors, sparse BM25, hybrid, MMR diversity, filtered -- all in one API | Typically dense-only; hybrid and filtering are add-ons |
| **Storage** | Zero-copy mmap with `redb` -- instant loading, no deserialization | Load time scales with dataset size |
| **Concurrency** | Lock-free indexing with parallel batch processing via Rayon | Varies; many require single-writer locks |
| **Dependencies** | Minimal -- pure Rust, compiles anywhere `rustc` runs | Often depends on C/C++ libraries (BLAS, LAPACK) |
| **Cost** | Free forever -- open source (MIT) | Per-vector or per-query pricing on managed tiers |

## Installation

Add `ruvector-core` to your `Cargo.toml`:

```toml
[dependencies]
ruvector-core = "0.1.0"
```

### Feature Flags

```toml
[dependencies]
ruvector-core = { version = "0.1.0", features = ["simd", "uuid-support"] }
```

Available features:
- `simd` (default): Enable SIMD-optimized distance calculations
- `uuid-support` (default): Enable UUID generation for vector IDs

## Key Features

| Feature | What It Does | Why It Matters |
|---------|-------------|----------------|
| **HNSW Indexing** | Hierarchical Navigable Small World graphs for O(log n) approximate nearest neighbor search | Sub-millisecond queries at million-vector scale |
| **Multiple Distance Metrics** | Euclidean, Cosine, Dot Product, Manhattan | Match the metric to your embedding model without conversion |
| **Scalar Quantization** | Compress vectors to 8-bit integers (4x reduction) | Cut memory by 75% with 98% recall preserved |
| **Product Quantization** | Split vectors into subspaces with codebooks (8-32x reduction) | Store millions of vectors on a single machine |
| **Binary Quantization** | 1-bit representation (32x reduction) | Ultra-fast screening pass for massive datasets |
| **SIMD Distance** | Hardware-accelerated distance via SimSIMD | Up to 80K QPS on 8 cores without code changes |
| **Zero-Copy I/O** | Memory-mapped storage loads instantly | No deserialization step -- open a file and search immediately |
| **Hybrid Search** | Combine dense vector similarity with sparse BM25 text scoring | One query handles both semantic and keyword matching |
| **Metadata Filtering** | Apply key-value filters during search | No post-filtering needed -- results are already filtered |
| **MMR Diversification** | Maximal Marginal Relevance re-ranking | Avoid redundant results when top-K are too similar |
| **Conformal Prediction** | Uncertainty quantification on search results | Know when to trust (or distrust) a match |
| **Lock-Free Indexing** | Concurrent reads and writes without blocking | High-throughput ingestion while serving queries |
| **Batch Processing** | Parallel insert and search via Rayon | Saturate all cores for bulk operations |

## Quick Start

### Basic Usage

```rust
use ruvector_core::{VectorDB, DbOptions, VectorEntry, SearchQuery, DistanceMetric};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a new vector database
    let mut options = DbOptions::default();
    options.dimensions = 384;  // Vector dimensions
    options.storage_path = "./my_vectors.db".to_string();
    options.distance_metric = DistanceMetric::Cosine;

    let db = VectorDB::new(options)?;

    // Insert vectors
    db.insert(VectorEntry {
        id: Some("doc1".to_string()),
        vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */],
        metadata: None,
    })?;

    db.insert(VectorEntry {
        id: Some("doc2".to_string()),
        vector: vec![0.4, 0.5, 0.6, /* ... 384 dimensions */],
        metadata: None,
    })?;

    // Search for similar vectors
    let results = db.search(SearchQuery {
        vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */],
        k: 10,  // Return top 10 results
        filter: None,
        ef_search: None,
    })?;

    for result in results {
        println!("ID: {}, Score: {}", result.id, result.score);
    }

    Ok(())
}
```

### Batch Operations

```rust
use ruvector_core::{VectorDB, VectorEntry};

// Insert multiple vectors efficiently
let entries = vec![
    VectorEntry {
        id: Some("doc1".to_string()),
        vector: vec![0.1, 0.2, 0.3],
        metadata: None,
    },
    VectorEntry {
        id: Some("doc2".to_string()),
        vector: vec![0.4, 0.5, 0.6],
        metadata: None,
    },
];

let ids = db.insert_batch(entries)?;
println!("Inserted {} vectors", ids.len());
```

### With Metadata Filtering

```rust
use std::collections::HashMap;
use serde_json::json;

// Insert with metadata
db.insert(VectorEntry {
    id: Some("product1".to_string()),
    vector: vec![0.1, 0.2, 0.3],
    metadata: Some(HashMap::from([
        ("category".to_string(), json!("electronics")),
        ("price".to_string(), json!(299.99)),
    ])),
})?;

// Search with metadata filter
let results = db.search(SearchQuery {
    vector: vec![0.1, 0.2, 0.3],
    k: 10,
    filter: Some(HashMap::from([
        ("category".to_string(), json!("electronics")),
    ])),
    ef_search: None,
})?;
```

### HNSW Configuration

```rust
use ruvector_core::{DbOptions, HnswConfig, DistanceMetric};

let mut options = DbOptions::default();
options.dimensions = 384;
options.distance_metric = DistanceMetric::Cosine;

// Configure HNSW index parameters
options.hnsw_config = Some(HnswConfig {
    m: 32,                    // Connections per layer (16-64 typical)
    ef_construction: 200,     // Build-time accuracy (100-500 typical)
    ef_search: 100,          // Search-time accuracy (50-200 typical)
    max_elements: 10_000_000, // Maximum vectors
});

let db = VectorDB::new(options)?;
```

### Quantization

```rust
use ruvector_core::{DbOptions, QuantizationConfig};

let mut options = DbOptions::default();
options.dimensions = 384;

// Enable scalar quantization (4x compression)
options.quantization = Some(QuantizationConfig::Scalar);

// Or product quantization (8-32x compression)
options.quantization = Some(QuantizationConfig::Product {
    subspaces: 8,  // Number of subspaces
    k: 256,        // Codebook size
});

let db = VectorDB::new(options)?;
```

## API Overview

### Core Types

```rust
// Main database interface
pub struct VectorDB { /* ... */ }

// Vector entry with optional ID and metadata
pub struct VectorEntry {
    pub id: Option<VectorId>,
    pub vector: Vec<f32>,
    pub metadata: Option<HashMap<String, serde_json::Value>>,
}

// Search query parameters
pub struct SearchQuery {
    pub vector: Vec<f32>,
    pub k: usize,
    pub filter: Option<HashMap<String, serde_json::Value>>,
    pub ef_search: Option<usize>,
}

// Search result with score
pub struct SearchResult {
    pub id: VectorId,
    pub score: f32,
    pub vector: Option<Vec<f32>>,
    pub metadata: Option<HashMap<String, serde_json::Value>>,
}
```

### Main Operations

```rust
impl VectorDB {
    // Create new database with options
    pub fn new(options: DbOptions) -> Result<Self>;

    // Create with just dimensions (uses defaults)
    pub fn with_dimensions(dimensions: usize) -> Result<Self>;

    // Insert single vector
    pub fn insert(&self, entry: VectorEntry) -> Result<VectorId>;

    // Insert multiple vectors
    pub fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<VectorId>>;

    // Search for similar vectors
    pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>;

    // Delete vector by ID
    pub fn delete(&self, id: &str) -> Result<bool>;

    // Get vector by ID
    pub fn get(&self, id: &str) -> Result<Option<VectorEntry>>;

    // Get total count
    pub fn len(&self) -> Result<usize>;

    // Check if empty
    pub fn is_empty(&self) -> Result<bool>;
}
```

### Distance Metrics

```rust
pub enum DistanceMetric {
    Euclidean,   // L2 distance - default for embeddings
    Cosine,      // Cosine similarity (1 - similarity)
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance
}
```

### Advanced Features

```rust
// Hybrid search (dense + sparse)
use ruvector_core::{HybridSearch, HybridConfig};

let hybrid = HybridSearch::new(HybridConfig {
    alpha: 0.7,  // Balance between dense (0.7) and sparse (0.3)
    ..Default::default()
});

// Filtered search with expressions
use ruvector_core::{FilteredSearch, FilterExpression};

let filtered = FilteredSearch::new(db);
let expr = FilterExpression::And(vec![
    FilterExpression::Equals("category".to_string(), json!("books")),
    FilterExpression::GreaterThan("price".to_string(), json!(10.0)),
]);

// MMR diversification
use ruvector_core::{MMRSearch, MMRConfig};

let mmr = MMRSearch::new(MMRConfig {
    lambda: 0.5,  // Balance relevance (0.5) and diversity (0.5)
    ..Default::default()
});
```

## Performance

### Latency (Single Query)

```
Operation           Flat Index    HNSW Index
---------------------------------------------
Search (1K vecs)    ~0.1ms       ~0.2ms
Search (100K vecs)  ~10ms        ~0.5ms
Search (1M vecs)    ~100ms       <1ms
Insert              ~0.1ms       ~1ms
Batch (1000)        ~50ms        ~500ms
```

### Memory Usage (1M Vectors, 384 Dimensions)

```
Configuration              Memory      Recall
---------------------------------------------
Full Precision (f32)       ~1.5GB      100%
Scalar Quantization        ~400MB      98%
Product Quantization       ~200MB      95%
Binary Quantization        ~50MB       85%
```

### Throughput (Queries Per Second)

```
Configuration              QPS         Latency (p50)
-----------------------------------------------------
Single Thread             ~2,000      ~0.5ms
Multi-Thread (8 cores)    ~50,000     <0.5ms
With SIMD                 ~80,000     <0.3ms
With Quantization         ~100,000    <0.2ms
```

## Configuration Guide

### For Maximum Accuracy

```rust
let options = DbOptions {
    dimensions: 384,
    distance_metric: DistanceMetric::Cosine,
    hnsw_config: Some(HnswConfig {
        m: 64,
        ef_construction: 500,
        ef_search: 200,
        max_elements: 10_000_000,
    }),
    quantization: None,  // Full precision
    ..Default::default()
};
```

### For Maximum Speed

```rust
let options = DbOptions {
    dimensions: 384,
    distance_metric: DistanceMetric::DotProduct,
    hnsw_config: Some(HnswConfig {
        m: 16,
        ef_construction: 100,
        ef_search: 50,
        max_elements: 10_000_000,
    }),
    quantization: Some(QuantizationConfig::Binary),
    ..Default::default()
};
```

### For Balanced Performance

```rust
let options = DbOptions::default(); // Recommended defaults
```

## Building and Testing

### Build

```bash
# Build with default features
cargo build --release

# Build without SIMD
cargo build --release --no-default-features --features uuid-support

# Build for specific target with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
```

### Testing

```bash
# Run all tests
cargo test

# Run with specific features
cargo test --features simd

# Run with logging
RUST_LOG=debug cargo test
```

### Benchmarks

```bash
# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench --bench hnsw_search

# Run with features
cargo bench --features simd
```

Available benchmarks:
- `distance_metrics` - SIMD-optimized distance calculations
- `hnsw_search` - HNSW index search performance
- `quantization_bench` - Quantization techniques
- `batch_operations` - Batch insert/search operations
- `comprehensive_bench` - Full system benchmarks

## Related Crates

`ruvector-core` is the foundation for platform-specific bindings:

- **[ruvector-node]../ruvector-node/** - Node.js bindings via NAPI-RS
- **[ruvector-wasm]../ruvector-wasm/** - WebAssembly bindings for browsers
- **[ruvector-gnn]../ruvector-gnn/** - Graph Neural Network layer for learned search
- **[ruvector-cli]../ruvector-cli/** - Command-line interface
- **[ruvector-bench]../ruvector-bench/** - Performance benchmarks

## Documentation

- **[Main README]../../README.md** - Complete project overview
- **[Getting Started Guide]../../docs/guide/GETTING_STARTED.md** - Quick start tutorial
- **[Rust API Reference]../../docs/api/RUST_API.md** - Detailed API documentation
- **[Advanced Features Guide]../../docs/guide/ADVANCED_FEATURES.md** - Quantization, indexing, tuning
- **[Performance Tuning]../../docs/optimization/PERFORMANCE_TUNING_GUIDE.md** - Optimization strategies
- **[API Documentation]https://docs.rs/ruvector-core** - Full API reference on docs.rs

## Acknowledgments

Built with state-of-the-art algorithms and libraries:

- **[hnsw_rs]https://crates.io/crates/hnsw_rs** - HNSW implementation
- **[simsimd]https://crates.io/crates/simsimd** - SIMD distance calculations
- **[redb]https://crates.io/crates/redb** - Embedded database
- **[rayon]https://crates.io/crates/rayon** - Data parallelism
- **[memmap2]https://crates.io/crates/memmap2** - Memory-mapped files

## License

**MIT License** - see [LICENSE](../../LICENSE) for details.

---

<div align="center">

**Part of [RuVector](https://github.com/ruvnet/ruvector) - Built by [rUv](https://ruv.io)**

[![Star on GitHub](https://img.shields.io/github/stars/ruvnet/ruvector?style=social)](https://github.com/ruvnet/ruvector)

[Documentation]https://docs.rs/ruvector-core | [Crates.io]https://crates.io/crates/ruvector-core | [GitHub]https://github.com/ruvnet/ruvector

</div>