oxify-vector 0.1.0

In-memory vector search and similarity operations for OxiFY (ported from OxiRS)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
# oxify-vector

In-memory vector similarity search for RAG (Retrieval-Augmented Generation) in OxiFY.

## Overview

`oxify-vector` provides fast, efficient vector similarity search for building RAG workflows. It uses exact search algorithms with optional parallel processing, making it ideal for small to medium datasets (<100k vectors).

**Ported from**: [OxiRS](https://github.com/cool-japan/oxirs) - Battle-tested in production semantic web applications.

## Features

### Core Search
- **Multiple Distance Metrics**: Cosine, Euclidean, Dot Product, Manhattan
- **Parallel Search**: Multi-threaded search using Rayon
- **Exact Search**: Brute-force search for guaranteed best results
- **Incremental Updates**: Add, remove, and update vectors without rebuilding

### Advanced Algorithms
- **HNSW Index**: Hierarchical Navigable Small World for fast approximate search
- **IVF-PQ Index**: Inverted File with Product Quantization for memory-efficient large-scale search
- **Distributed Index**: Consistent hashing across multiple shards
- **ColBERT**: Multi-vector search for token-level matching

### Optimizations
- **Query Optimizer**: Automatic strategy selection (brute-force vs HNSW vs IVF-PQ)
- **Scalar Quantization**: 4x memory reduction (float32 → uint8) with minimal accuracy loss
- **SIMD Acceleration**: AVX2 optimizations for distance computations
- **Multi-Index Search**: Search across multiple indexes in parallel

### Filtering & Search
- **Filtered Search**: Metadata-based filtering with pre/post-filtering strategies
- **Hybrid Search**: Vector + BM25 keyword search with RRF fusion
- **Batch Search**: Process multiple queries efficiently
- **Radius Search**: Find all neighbors within a distance threshold

### Integration
- **Embeddings**: OpenAI and Ollama embedding providers with caching
- **Persistence**: Save/load indexes to disk with optional memory-mapping
- **OpenTelemetry**: Optional distributed tracing support
- **Type-Safe**: Strongly-typed API with compile-time guarantees

## Installation

```toml
[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector" }

# Or with parallel search
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }
```

### Feature Flags

- `parallel`: Enable multi-threaded search using Rayon

## Quick Start

### Basic Vector Search

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embeddings (typically from an embedding model)
    let mut embeddings = HashMap::new();
    embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3, 0.4]);
    embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4, 0.5]);
    embeddings.insert("doc3".to_string(), vec![0.3, 0.4, 0.5, 0.6]);

    // Configure search
    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    // Build index
    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // Search for similar vectors
    let query = vec![0.15, 0.25, 0.35, 0.45];
    let results = index.search(&query, 2)?;

    for result in results {
        println!("Entity: {}, Score: {:.4}", result.entity_id, result.score);
    }

    Ok(())
}
```

## Core Components

### `VectorSearchIndex`

Main interface for vector search operations.

```rust
pub struct VectorSearchIndex {
    config: SearchConfig,
    entity_ids: Vec<String>,
    embedding_matrix: Option<Vec<Vec<f32>>>,
}

impl VectorSearchIndex {
    pub fn new(config: SearchConfig) -> Self;
    pub fn build(&mut self, embeddings: &HashMap<String, Vec<f32>>) -> Result<()>;
    pub fn search(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
    pub fn add(&mut self, entity_id: String, embedding: Vec<f32>) -> Result<()>;
    pub fn remove(&mut self, entity_id: &str) -> Result<()>;
}
```

### `SearchConfig`

Configuration for search behavior.

```rust
pub struct SearchConfig {
    pub metric: DistanceMetric,
    pub normalize: bool,
    pub parallel: bool,
}

impl Default for SearchConfig {
    fn default() -> Self {
        Self {
            metric: DistanceMetric::Cosine,
            normalize: true,   // Auto-normalize for cosine similarity
            parallel: false,   // Single-threaded by default
        }
    }
}
```

### `DistanceMetric`

Supported distance/similarity metrics.

```rust
pub enum DistanceMetric {
    Cosine,      // Cosine similarity (default for RAG)
    Euclidean,   // L2 distance
    DotProduct,  // Dot product similarity
    Manhattan,   // L1 distance (Manhattan/Taxicab)
}
```

### `SearchResult`

Result from a similarity search.

```rust
pub struct SearchResult {
    pub entity_id: String,
    pub score: f32,      // Higher is better (similarity score)
    pub distance: f32,   // Lower is better (distance)
    pub rank: usize,     // 1-indexed rank
}
```

## Distance Metrics

### Cosine Similarity (Recommended for RAG)

Measures the cosine of the angle between vectors. Range: [-1, 1], higher is more similar.

**Use when**: Magnitude doesn't matter, only direction (typical for text embeddings)

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Cosine;
```

**Example**:
- `query: [0.5, 0.5, 0.0]`
- `doc1: [1.0, 1.0, 0.0]` → score: 1.0 (identical direction)
- `doc2: [0.0, 1.0, 0.0]` → score: 0.71 (45° angle)
- `doc3: [-1.0, -1.0, 0.0]` → score: -1.0 (opposite direction)

### Euclidean Distance (L2)

Straight-line distance between vectors. Range: [0, ∞), lower is more similar.

**Use when**: Absolute magnitude matters

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Euclidean;
config.normalize = false;  // Preserve magnitude
```

**Example**:
- `query: [1.0, 1.0]`
- `doc1: [1.0, 1.0]` → distance: 0.0 (identical)
- `doc2: [2.0, 2.0]` → distance: 1.41
- `doc3: [0.0, 0.0]` → distance: 1.41

### Dot Product

Inner product of vectors. Range: (-∞, ∞), higher is more similar.

**Use when**: Combining similarity and magnitude

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::DotProduct;
config.normalize = false;
```

### Manhattan Distance (L1)

Sum of absolute differences. Range: [0, ∞), lower is more similar.

**Use when**: Grid-based distances or robustness to outliers

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Manhattan;
```

## RAG Workflow Example

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[derive(Debug)]
struct Document {
    id: String,
    content: String,
    embedding: Vec<f32>,
}

async fn rag_pipeline(
    query: &str,
    documents: Vec<Document>,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    // 1. Build vector index
    let mut embeddings = HashMap::new();
    for doc in &documents {
        embeddings.insert(doc.id.clone(), doc.embedding.clone());
    }

    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // 2. Get query embedding (from embedding model)
    let query_embedding = get_embedding(query).await?;

    // 3. Search for relevant documents
    let results = index.search(&query_embedding, 3)?;

    // 4. Retrieve document content
    let mut context_docs = Vec::new();
    for result in results {
        if let Some(doc) = documents.iter().find(|d| d.id == result.entity_id) {
            context_docs.push(doc.content.clone());
        }
    }

    Ok(context_docs)
}

async fn get_embedding(text: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    // Call embedding API (OpenAI, Cohere, etc.)
    // For example: text-embedding-ada-002
    Ok(vec![0.1, 0.2, 0.3, 0.4])  // Placeholder
}
```

## New Features

### Adaptive Index (NEW)

The easiest way to use oxify-vector - automatically selects the best index type and optimizes performance:

```rust
use oxify_vector::{AdaptiveIndex, AdaptiveConfig};
use std::collections::HashMap;

// Create adaptive index - starts simple, upgrades as needed
let mut index = AdaptiveIndex::new(AdaptiveConfig::default());

// Build from embeddings
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Search - automatically uses best strategy
let query = vec![0.15, 0.25, 0.35];
let results = index.search(&query, 10)?;

// Add more data - may trigger automatic upgrade
for i in 0..10000 {
    index.add_vector(format!("doc_{}", i), vec![/* ... */])?;
}

// Check current strategy and performance
let stats = index.stats();
println!("Strategy: {:?}", stats.current_strategy);
println!("Avg latency: {:.2}ms", stats.avg_latency_ms);
println!("P95 latency: {:.2}ms", stats.p95_latency_ms);
```

**Features:**
- **Automatic upgrades**: Starts with brute-force, upgrades to HNSW as dataset grows
- **Performance tracking**: Monitors latency and optimizes automatically
- **Simple API**: One interface for all index types
- **Configurable**: Presets for high accuracy or low latency

**Configuration presets:**
```rust
// High accuracy (slower, more accurate)
let config = AdaptiveConfig::high_accuracy();
let mut index = AdaptiveIndex::new(config);

// Low latency (faster, good enough accuracy)
let config = AdaptiveConfig::low_latency();
let mut index = AdaptiveIndex::new(config);
```

**When to use:**
- You want optimal performance without manual tuning
- Dataset size changes over time
- You need automatic performance optimization
- You want a simple "it just works" API

### Incremental Index Updates

Add, remove, and update vectors without rebuilding the entire index:

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig};
use std::collections::HashMap;

let mut index = VectorSearchIndex::new(SearchConfig::default());

// Build initial index
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Add a single vector
index.add_vector("doc3".to_string(), vec![0.3, 0.4, 0.5])?;

// Add multiple vectors
let mut new_docs = HashMap::new();
new_docs.insert("doc4".to_string(), vec![0.4, 0.5, 0.6]);
new_docs.insert("doc5".to_string(), vec![0.5, 0.6, 0.7]);
index.add_vectors(&new_docs)?;

// Update an existing vector
index.update_vector("doc1", vec![0.9, 0.9, 0.9])?;

// Remove a vector
index.remove_vector("doc2")?;
```

### Query Optimizer

Automatically select the best search strategy based on dataset size and requirements:

```rust
use oxify_vector::optimizer::{QueryOptimizer, OptimizerConfig, SearchStrategy};

let optimizer = QueryOptimizer::new(OptimizerConfig::default());

// Recommend strategy based on dataset size and required recall
let num_vectors = 100_000;
let required_recall = 0.95;
let strategy = optimizer.recommend_strategy(num_vectors, required_recall);

match strategy {
    SearchStrategy::BruteForce => println!("Use exact search (< 10K vectors)"),
    SearchStrategy::Hnsw => println!("Use HNSW (10K - 1M vectors)"),
    SearchStrategy::IvfPq => println!("Use IVF-PQ (> 1M vectors)"),
    SearchStrategy::Distributed => println!("Use distributed search (> 10M vectors)"),
}

// Optimize pre/post-filtering
let filter_selectivity = 0.05; // 5% of vectors match filter
let use_prefilter = optimizer.recommend_prefiltering(num_vectors, filter_selectivity);

// Optimize batch size
let batch_size = optimizer.recommend_batch_size(1000, num_vectors);
```

**Presets for common scenarios:**

```rust
use oxify_vector::OptimizerConfig;

// High accuracy (use exact search longer, higher recall threshold)
let config = OptimizerConfig::high_accuracy();

// High speed (switch to ANN earlier, lower recall threshold)
let config = OptimizerConfig::high_speed();

// Memory efficient (use quantization earlier, disable caching)
let config = OptimizerConfig::memory_efficient();
```

### Scalar Quantization

Reduce memory usage by 75% with minimal accuracy loss:

```rust
use oxify_vector::quantization::{QuantizedVectorIndex, QuantizationConfig};

// Generate dataset
let vectors: Vec<(String, Vec<f32>)> = (0..10000)
    .map(|i| (format!("doc_{}", i), vec![/* ... */]))
    .collect();

// Build quantized index
let mut index = QuantizedVectorIndex::new(QuantizationConfig::default());
index.build(&vectors)?;

// Get statistics
let stats = index.stats();
println!("Original size: {} bytes", stats.original_bytes);
println!("Quantized size: {} bytes", stats.quantized_bytes);
println!("Compression: {:.2}x", stats.compression_ratio);
println!("Memory saved: {:.1}%", stats.memory_savings * 100.0);

// Search (automatically uses quantized distance)
let query = vec![0.5, 0.5, 0.5];
let results = index.search(&query, 10)?;
```

**Benefits:**
- **Memory**: 4x reduction (float32 → uint8)
- **Speed**: Faster distance computations with integer math
- **Accuracy**: ~1-2% recall degradation for most datasets

### Multi-Index Search

Search across multiple indexes in parallel and merge results:

```rust
use oxify_vector::{MultiIndexSearch, MultiIndexConfig, ScoreMergeStrategy};
use std::collections::HashMap;

// Create multiple indexes (e.g., different data shards or time periods)
let mut index1 = VectorSearchIndex::new(SearchConfig::default());
let mut index2 = VectorSearchIndex::new(SearchConfig::default());

// Build indexes...
index1.build(&embeddings1)?;
index2.build(&embeddings2)?;

// Configure multi-index search
let config = MultiIndexConfig {
    parallel: true,                           // Search indexes in parallel
    deduplicate: true,                        // Remove duplicate entity_ids
    merge_strategy: ScoreMergeStrategy::Max,  // Take max score for duplicates
};

let multi_search = MultiIndexSearch::with_config(config);

// Search across both indexes
let query = vec![0.5, 0.5, 0.5];
let results = multi_search.search(&[&index1, &index2], &query, 10)?;
```

**Score merge strategies:**
- `ScoreMergeStrategy::Max` - Take highest score (recommended)
- `ScoreMergeStrategy::Min` - Take lowest score
- `ScoreMergeStrategy::Average` - Average scores
- `ScoreMergeStrategy::First` - Take first occurrence

## Parallel Search

Enable parallel processing for large datasets:

```toml
[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }
```

```rust
let mut config = SearchConfig::default();
config.parallel = true;  // Use Rayon for parallel search

let mut index = VectorSearchIndex::new(config);
index.build(&embeddings)?;

// Search uses all CPU cores
let results = index.search(&query, 10)?;
```

**Performance**:
- Single-threaded: ~1ms for 1k vectors
- Parallel (8 cores): ~0.2ms for 1k vectors
- Speedup scales with core count

## Integration with LLM Workflows

### Axum API Endpoint

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig};
use axum::{extract::State, http::StatusCode, Json};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use tokio::sync::RwLock;

#[derive(Clone)]
struct AppState {
    vector_index: Arc<RwLock<VectorSearchIndex>>,
}

#[derive(Deserialize)]
struct SearchRequest {
    query: Vec<f32>,
    k: usize,
}

#[derive(Serialize)]
struct SearchResponse {
    results: Vec<SearchResult>,
}

async fn search_handler(
    State(state): State<AppState>,
    Json(req): Json<SearchRequest>,
) -> Result<Json<SearchResponse>, StatusCode> {
    let index = state.vector_index.read().await;

    let results = index
        .search(&req.query, req.k)
        .map_err(|_| StatusCode::BAD_REQUEST)?;

    Ok(Json(SearchResponse { results }))
}
```

### Dynamic Index Updates

```rust
use oxify_vector::VectorSearchIndex;
use tokio::sync::RwLock;
use std::sync::Arc;

async fn add_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: String,
    embedding: Vec<f32>,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.add(doc_id, embedding)?;
    Ok(())
}

async fn remove_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.remove(doc_id)?;
    Ok(())
}
```

## Limitations

### When to Use

- **Small to medium datasets**: <100k vectors
- **Development and prototyping**: Fast iteration without external dependencies
- **Exact search required**: Guaranteed best results
- **Low latency**: Sub-millisecond search

### When NOT to Use

- **Large datasets**: >100k vectors (use Qdrant, Pinecone, Weaviate instead)
- **Approximate search**: If 99% accuracy is acceptable (use HNSW, IVF instead)
- **Persistence**: Index is in-memory only (serialize/deserialize manually)
- **Distributed search**: No multi-node support

## Scaling to Production

For production RAG at scale, integrate with external vector databases:

```rust
// oxify-vector for development
#[cfg(debug_assertions)]
use oxify_vector::VectorSearchIndex;

// Qdrant for production
#[cfg(not(debug_assertions))]
use oxify_connect_vector::QdrantClient;
```

Or use `oxify-connect-vector` for Qdrant/pgvector:

```rust
use oxify_connect_vector::{VectorProvider, QdrantProvider};

let provider = QdrantProvider::new("http://localhost:6334").await?;
provider.search("collection_name", &query, 10).await?;
```

## Performance Benchmarks

**Hardware**: Intel i7-12700K (12 cores), 32GB RAM

| Vectors | Dimensions | Metric | Parallel | Time |
|---------|------------|--------|----------|------|
| 1k | 768 | Cosine | No | 1.2ms |
| 1k | 768 | Cosine | Yes | 0.3ms |
| 10k | 768 | Cosine | No | 12ms |
| 10k | 768 | Cosine | Yes | 2.5ms |
| 100k | 768 | Cosine | No | 120ms |
| 100k | 768 | Cosine | Yes | 25ms |

**Memory usage**: ~4 bytes/dimension/vector (768D = 3KB per vector)

## Testing

Run the test suite:

```bash
cd crates/engine/oxify-vector
cargo test

# Run with parallel feature
cargo test --features parallel
```

All tests pass with zero warnings.

## Dependencies

Core dependencies:
- `rayon` - Parallel processing (optional)

No external vector database required.

## Migration from Other Libraries

### From FAISS

```python
# FAISS (Python)
import faiss
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
distances, indices = index.search(query, k)
```

```rust
// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig {
    metric: DistanceMetric::Euclidean,
    ..Default::default()
});
index.build(&embeddings)?;
let results = index.search(&query, k)?;
```

### From ChromaDB

```python
# ChromaDB (Python)
collection = client.create_collection("docs")
collection.add(documents=docs, embeddings=embeddings, ids=ids)
results = collection.query(query_embeddings=[query], n_results=k)
```

```rust
// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig::default());
index.build(&embeddings)?;
let results = index.search(&query, k)?;
```

## Implemented Features

All core features are production-ready:

- **Adaptive Index**: Automatic performance optimization (NEW)
- ✅ Approximate search algorithms (HNSW, IVF-PQ)
- ✅ Serialization/deserialization for persistence
- ✅ Filtered search (metadata filtering)
- ✅ Hybrid search (vector + BM25 keyword)
- ✅ Batch search optimization
- ✅ Incremental index updates
- ✅ Query optimizer for automatic strategy selection
- ✅ Scalar quantization for memory efficiency
- ✅ Multi-index search
- ✅ Distributed search with sharding
- ✅ ColBERT multi-vector search
- ✅ SIMD optimizations (AVX2)
- ✅ OpenTelemetry tracing support

## Future Enhancements

- [ ] GPU acceleration (CUDA/ROCm)
- [ ] Product Quantization (PQ) for extreme compression
- [ ] Learned indexes (AI-optimized data structures)
- [ ] Streaming index updates (real-time ingestion)

## License

Apache-2.0

## Attribution

Ported from [OxiRS](https://github.com/cool-japan/oxirs) with permission. Original implementation by the OxiLabs team.