oxify-vector
In-memory vector similarity search for RAG (Retrieval-Augmented Generation) in OxiFY.
Overview
oxify-vector provides fast, efficient vector similarity search for building RAG workflows. It uses exact search algorithms with optional parallel processing, making it ideal for small to medium datasets (<100k vectors).
Ported from: OxiRS - Battle-tested in production semantic web applications.
Features
Core Search
- Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Manhattan
- Parallel Search: Multi-threaded search using Rayon
- Exact Search: Brute-force search for guaranteed best results
- Incremental Updates: Add, remove, and update vectors without rebuilding
Advanced Algorithms
- HNSW Index: Hierarchical Navigable Small World for fast approximate search
- IVF-PQ Index: Inverted File with Product Quantization for memory-efficient large-scale search
- Distributed Index: Consistent hashing across multiple shards
- ColBERT: Multi-vector search for token-level matching
Optimizations
- Query Optimizer: Automatic strategy selection (brute-force vs HNSW vs IVF-PQ)
- Scalar Quantization: 4x memory reduction (float32 → uint8) with minimal accuracy loss
- SIMD Acceleration: AVX2 optimizations for distance computations
- Multi-Index Search: Search across multiple indexes in parallel
Filtering & Search
- Filtered Search: Metadata-based filtering with pre/post-filtering strategies
- Hybrid Search: Vector + BM25 keyword search with RRF fusion
- Batch Search: Process multiple queries efficiently
- Radius Search: Find all neighbors within a distance threshold
Integration
- Embeddings: OpenAI and Ollama embedding providers with caching
- Persistence: Save/load indexes to disk with optional memory-mapping
- OpenTelemetry: Optional distributed tracing support
- Type-Safe: Strongly-typed API with compile-time guarantees
Installation
[]
= { = "../crates/engine/oxify-vector" }
# Or with parallel search
= { = "../crates/engine/oxify-vector", = ["parallel"] }
Feature Flags
parallel: Enable multi-threaded search using Rayon
Quick Start
Basic Vector Search
use ;
use HashMap;
async
Core Components
VectorSearchIndex
Main interface for vector search operations.
SearchConfig
Configuration for search behavior.
DistanceMetric
Supported distance/similarity metrics.
SearchResult
Result from a similarity search.
Distance Metrics
Cosine Similarity (Recommended for RAG)
Measures the cosine of the angle between vectors. Range: [-1, 1], higher is more similar.
Use when: Magnitude doesn't matter, only direction (typical for text embeddings)
let mut config = default;
config.metric = Cosine;
Example:
query: [0.5, 0.5, 0.0]doc1: [1.0, 1.0, 0.0]→ score: 1.0 (identical direction)doc2: [0.0, 1.0, 0.0]→ score: 0.71 (45° angle)doc3: [-1.0, -1.0, 0.0]→ score: -1.0 (opposite direction)
Euclidean Distance (L2)
Straight-line distance between vectors. Range: [0, ∞), lower is more similar.
Use when: Absolute magnitude matters
let mut config = default;
config.metric = Euclidean;
config.normalize = false; // Preserve magnitude
Example:
query: [1.0, 1.0]doc1: [1.0, 1.0]→ distance: 0.0 (identical)doc2: [2.0, 2.0]→ distance: 1.41doc3: [0.0, 0.0]→ distance: 1.41
Dot Product
Inner product of vectors. Range: (-∞, ∞), higher is more similar.
Use when: Combining similarity and magnitude
let mut config = default;
config.metric = DotProduct;
config.normalize = false;
Manhattan Distance (L1)
Sum of absolute differences. Range: [0, ∞), lower is more similar.
Use when: Grid-based distances or robustness to outliers
let mut config = default;
config.metric = Manhattan;
RAG Workflow Example
use ;
use HashMap;
async
async
New Features
Adaptive Index (NEW)
The easiest way to use oxify-vector - automatically selects the best index type and optimizes performance:
use ;
use HashMap;
// Create adaptive index - starts simple, upgrades as needed
let mut index = new;
// Build from embeddings
let mut embeddings = new;
embeddings.insert;
embeddings.insert;
index.build?;
// Search - automatically uses best strategy
let query = vec!;
let results = index.search?;
// Add more data - may trigger automatic upgrade
for i in 0..10000
// Check current strategy and performance
let stats = index.stats;
println!;
println!;
println!;
Features:
- Automatic upgrades: Starts with brute-force, upgrades to HNSW as dataset grows
- Performance tracking: Monitors latency and optimizes automatically
- Simple API: One interface for all index types
- Configurable: Presets for high accuracy or low latency
Configuration presets:
// High accuracy (slower, more accurate)
let config = high_accuracy;
let mut index = new;
// Low latency (faster, good enough accuracy)
let config = low_latency;
let mut index = new;
When to use:
- You want optimal performance without manual tuning
- Dataset size changes over time
- You need automatic performance optimization
- You want a simple "it just works" API
Incremental Index Updates
Add, remove, and update vectors without rebuilding the entire index:
use ;
use HashMap;
let mut index = new;
// Build initial index
let mut embeddings = new;
embeddings.insert;
embeddings.insert;
index.build?;
// Add a single vector
index.add_vector?;
// Add multiple vectors
let mut new_docs = new;
new_docs.insert;
new_docs.insert;
index.add_vectors?;
// Update an existing vector
index.update_vector?;
// Remove a vector
index.remove_vector?;
Query Optimizer
Automatically select the best search strategy based on dataset size and requirements:
use ;
let optimizer = new;
// Recommend strategy based on dataset size and required recall
let num_vectors = 100_000;
let required_recall = 0.95;
let strategy = optimizer.recommend_strategy;
match strategy
// Optimize pre/post-filtering
let filter_selectivity = 0.05; // 5% of vectors match filter
let use_prefilter = optimizer.recommend_prefiltering;
// Optimize batch size
let batch_size = optimizer.recommend_batch_size;
Presets for common scenarios:
use OptimizerConfig;
// High accuracy (use exact search longer, higher recall threshold)
let config = high_accuracy;
// High speed (switch to ANN earlier, lower recall threshold)
let config = high_speed;
// Memory efficient (use quantization earlier, disable caching)
let config = memory_efficient;
Scalar Quantization
Reduce memory usage by 75% with minimal accuracy loss:
use ;
// Generate dataset
let vectors: =
.map
.collect;
// Build quantized index
let mut index = new;
index.build?;
// Get statistics
let stats = index.stats;
println!;
println!;
println!;
println!;
// Search (automatically uses quantized distance)
let query = vec!;
let results = index.search?;
Benefits:
- Memory: 4x reduction (float32 → uint8)
- Speed: Faster distance computations with integer math
- Accuracy: ~1-2% recall degradation for most datasets
Multi-Index Search
Search across multiple indexes in parallel and merge results:
use ;
use HashMap;
// Create multiple indexes (e.g., different data shards or time periods)
let mut index1 = new;
let mut index2 = new;
// Build indexes...
index1.build?;
index2.build?;
// Configure multi-index search
let config = MultiIndexConfig ;
let multi_search = with_config;
// Search across both indexes
let query = vec!;
let results = multi_search.search?;
Score merge strategies:
ScoreMergeStrategy::Max- Take highest score (recommended)ScoreMergeStrategy::Min- Take lowest scoreScoreMergeStrategy::Average- Average scoresScoreMergeStrategy::First- Take first occurrence
Parallel Search
Enable parallel processing for large datasets:
[]
= { = "../crates/engine/oxify-vector", = ["parallel"] }
let mut config = default;
config.parallel = true; // Use Rayon for parallel search
let mut index = new;
index.build?;
// Search uses all CPU cores
let results = index.search?;
Performance:
- Single-threaded: ~1ms for 1k vectors
- Parallel (8 cores): ~0.2ms for 1k vectors
- Speedup scales with core count
Integration with LLM Workflows
Axum API Endpoint
use ;
use ;
use ;
use Arc;
use RwLock;
async
Dynamic Index Updates
use VectorSearchIndex;
use RwLock;
use Arc;
async
async
Limitations
When to Use
- Small to medium datasets: <100k vectors
- Development and prototyping: Fast iteration without external dependencies
- Exact search required: Guaranteed best results
- Low latency: Sub-millisecond search
When NOT to Use
- Large datasets: >100k vectors (use Qdrant, Pinecone, Weaviate instead)
- Approximate search: If 99% accuracy is acceptable (use HNSW, IVF instead)
- Persistence: Index is in-memory only (serialize/deserialize manually)
- Distributed search: No multi-node support
Scaling to Production
For production RAG at scale, integrate with external vector databases:
// oxify-vector for development
use VectorSearchIndex;
// Qdrant for production
use QdrantClient;
Or use oxify-connect-vector for Qdrant/pgvector:
use ;
let provider = new.await?;
provider.search.await?;
Performance Benchmarks
Hardware: Intel i7-12700K (12 cores), 32GB RAM
| Vectors | Dimensions | Metric | Parallel | Time |
|---|---|---|---|---|
| 1k | 768 | Cosine | No | 1.2ms |
| 1k | 768 | Cosine | Yes | 0.3ms |
| 10k | 768 | Cosine | No | 12ms |
| 10k | 768 | Cosine | Yes | 2.5ms |
| 100k | 768 | Cosine | No | 120ms |
| 100k | 768 | Cosine | Yes | 25ms |
Memory usage: ~4 bytes/dimension/vector (768D = 3KB per vector)
Testing
Run the test suite:
# Run with parallel feature
All tests pass with zero warnings.
Dependencies
Core dependencies:
rayon- Parallel processing (optional)
No external vector database required.
Migration from Other Libraries
From FAISS
# FAISS (Python)
=
, =
// oxify-vector (Rust)
let mut index = new;
index.build?;
let results = index.search?;
From ChromaDB
# ChromaDB (Python)
=
=
// oxify-vector (Rust)
let mut index = new;
index.build?;
let results = index.search?;
Implemented Features
All core features are production-ready:
- ✅ Adaptive Index: Automatic performance optimization (NEW)
- ✅ Approximate search algorithms (HNSW, IVF-PQ)
- ✅ Serialization/deserialization for persistence
- ✅ Filtered search (metadata filtering)
- ✅ Hybrid search (vector + BM25 keyword)
- ✅ Batch search optimization
- ✅ Incremental index updates
- ✅ Query optimizer for automatic strategy selection
- ✅ Scalar quantization for memory efficiency
- ✅ Multi-index search
- ✅ Distributed search with sharding
- ✅ ColBERT multi-vector search
- ✅ SIMD optimizations (AVX2)
- ✅ OpenTelemetry tracing support
Future Enhancements
- GPU acceleration (CUDA/ROCm)
- Product Quantization (PQ) for extreme compression
- Learned indexes (AI-optimized data structures)
- Streaming index updates (real-time ingestion)
License
Apache-2.0
Attribution
Ported from OxiRS with permission. Original implementation by the OxiLabs team.