oxify-vector

In-memory vector similarity search for RAG (Retrieval-Augmented Generation) in OxiFY.

Overview

oxify-vector provides fast, efficient vector similarity search for building RAG workflows. It uses exact search algorithms with optional parallel processing, making it ideal for small to medium datasets (<100k vectors).

Ported from: OxiRS - Battle-tested in production semantic web applications.

Features

Core Search

Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Manhattan
Parallel Search: Multi-threaded search using Rayon
Exact Search: Brute-force search for guaranteed best results
Incremental Updates: Add, remove, and update vectors without rebuilding

Advanced Algorithms

HNSW Index: Hierarchical Navigable Small World for fast approximate search
IVF-PQ Index: Inverted File with Product Quantization for memory-efficient large-scale search
Distributed Index: Consistent hashing across multiple shards
ColBERT: Multi-vector search for token-level matching

Optimizations

Query Optimizer: Automatic strategy selection (brute-force vs HNSW vs IVF-PQ)
Scalar Quantization: 4x memory reduction (float32 → uint8) with minimal accuracy loss
SIMD Acceleration: AVX2 optimizations for distance computations
Multi-Index Search: Search across multiple indexes in parallel

Filtering & Search

Filtered Search: Metadata-based filtering with pre/post-filtering strategies
Hybrid Search: Vector + BM25 keyword search with RRF fusion
Batch Search: Process multiple queries efficiently
Radius Search: Find all neighbors within a distance threshold

Integration

Embeddings: OpenAI and Ollama embedding providers with caching
Persistence: Save/load indexes to disk with optional memory-mapping
OpenTelemetry: Optional distributed tracing support
Type-Safe: Strongly-typed API with compile-time guarantees

Installation

[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector" }

# Or with parallel search
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }

Feature Flags

parallel: Enable multi-threaded search using Rayon

Quick Start

Basic Vector Search

use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embeddings (typically from an embedding model)
    let mut embeddings = HashMap::new();
    embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3, 0.4]);
    embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4, 0.5]);
    embeddings.insert("doc3".to_string(), vec![0.3, 0.4, 0.5, 0.6]);

    // Configure search
    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    // Build index
    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // Search for similar vectors
    let query = vec![0.15, 0.25, 0.35, 0.45];
    let results = index.search(&query, 2)?;

    for result in results {
        println!("Entity: {}, Score: {:.4}", result.entity_id, result.score);
    }

    Ok(())
}

Core Components

`VectorSearchIndex`

Main interface for vector search operations.

pub struct VectorSearchIndex {
    config: SearchConfig,
    entity_ids: Vec<String>,
    embedding_matrix: Option<Vec<Vec<f32>>>,
}

impl VectorSearchIndex {
    pub fn new(config: SearchConfig) -> Self;
    pub fn build(&mut self, embeddings: &HashMap<String, Vec<f32>>) -> Result<()>;
    pub fn search(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
    pub fn add(&mut self, entity_id: String, embedding: Vec<f32>) -> Result<()>;
    pub fn remove(&mut self, entity_id: &str) -> Result<()>;
}

`SearchConfig`

Configuration for search behavior.

pub struct SearchConfig {
    pub metric: DistanceMetric,
    pub normalize: bool,
    pub parallel: bool,
}

impl Default for SearchConfig {
    fn default() -> Self {
        Self {
            metric: DistanceMetric::Cosine,
            normalize: true,   // Auto-normalize for cosine similarity
            parallel: false,   // Single-threaded by default
        }
    }
}

`DistanceMetric`

Supported distance/similarity metrics.

pub enum DistanceMetric {
    Cosine,      // Cosine similarity (default for RAG)
    Euclidean,   // L2 distance
    DotProduct,  // Dot product similarity
    Manhattan,   // L1 distance (Manhattan/Taxicab)
}

`SearchResult`

Result from a similarity search.

pub struct SearchResult {
    pub entity_id: String,
    pub score: f32,      // Higher is better (similarity score)
    pub distance: f32,   // Lower is better (distance)
    pub rank: usize,     // 1-indexed rank
}

Distance Metrics

Cosine Similarity (Recommended for RAG)

Measures the cosine of the angle between vectors. Range: [-1, 1], higher is more similar.

Use when: Magnitude doesn't matter, only direction (typical for text embeddings)

let mut config = SearchConfig::default();
config.metric = DistanceMetric::Cosine;

Example:

query: [0.5, 0.5, 0.0]
doc1: [1.0, 1.0, 0.0] → score: 1.0 (identical direction)
doc2: [0.0, 1.0, 0.0] → score: 0.71 (45° angle)
doc3: [-1.0, -1.0, 0.0] → score: -1.0 (opposite direction)

Euclidean Distance (L2)

Straight-line distance between vectors. Range: [0, ∞), lower is more similar.

Use when: Absolute magnitude matters

let mut config = SearchConfig::default();
config.metric = DistanceMetric::Euclidean;
config.normalize = false;  // Preserve magnitude

Example:

query: [1.0, 1.0]
doc1: [1.0, 1.0] → distance: 0.0 (identical)
doc2: [2.0, 2.0] → distance: 1.41
doc3: [0.0, 0.0] → distance: 1.41

Dot Product

Inner product of vectors. Range: (-∞, ∞), higher is more similar.

Use when: Combining similarity and magnitude

let mut config = SearchConfig::default();
config.metric = DistanceMetric::DotProduct;
config.normalize = false;

Manhattan Distance (L1)

Sum of absolute differences. Range: [0, ∞), lower is more similar.

Use when: Grid-based distances or robustness to outliers

let mut config = SearchConfig::default();
config.metric = DistanceMetric::Manhattan;

RAG Workflow Example

use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[derive(Debug)]
struct Document {
    id: String,
    content: String,
    embedding: Vec<f32>,
}

async fn rag_pipeline(
    query: &str,
    documents: Vec<Document>,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    // 1. Build vector index
    let mut embeddings = HashMap::new();
    for doc in &documents {
        embeddings.insert(doc.id.clone(), doc.embedding.clone());
    }

    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // 2. Get query embedding (from embedding model)
    let query_embedding = get_embedding(query).await?;

    // 3. Search for relevant documents
    let results = index.search(&query_embedding, 3)?;

    // 4. Retrieve document content
    let mut context_docs = Vec::new();
    for result in results {
        if let Some(doc) = documents.iter().find(|d| d.id == result.entity_id) {
            context_docs.push(doc.content.clone());
        }
    }

    Ok(context_docs)
}

async fn get_embedding(text: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    // Call embedding API (OpenAI, Cohere, etc.)
    // For example: text-embedding-ada-002
    Ok(vec![0.1, 0.2, 0.3, 0.4])  // Placeholder
}

New Features

Adaptive Index (NEW)

The easiest way to use oxify-vector - automatically selects the best index type and optimizes performance:

use oxify_vector::{AdaptiveIndex, AdaptiveConfig};
use std::collections::HashMap;

// Create adaptive index - starts simple, upgrades as needed
let mut index = AdaptiveIndex::new(AdaptiveConfig::default());

// Build from embeddings
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Search - automatically uses best strategy
let query = vec![0.15, 0.25, 0.35];
let results = index.search(&query, 10)?;

// Add more data - may trigger automatic upgrade
for i in 0..10000 {
    index.add_vector(format!("doc_{}", i), vec![/* ... */])?;
}

// Check current strategy and performance
let stats = index.stats();
println!("Strategy: {:?}", stats.current_strategy);
println!("Avg latency: {:.2}ms", stats.avg_latency_ms);
println!("P95 latency: {:.2}ms", stats.p95_latency_ms);

Features:

Automatic upgrades: Starts with brute-force, upgrades to HNSW as dataset grows
Performance tracking: Monitors latency and optimizes automatically
Simple API: One interface for all index types
Configurable: Presets for high accuracy or low latency

Configuration presets:

// High accuracy (slower, more accurate)
let config = AdaptiveConfig::high_accuracy();
let mut index = AdaptiveIndex::new(config);

// Low latency (faster, good enough accuracy)
let config = AdaptiveConfig::low_latency();
let mut index = AdaptiveIndex::new(config);

When to use:

You want optimal performance without manual tuning
Dataset size changes over time
You need automatic performance optimization
You want a simple "it just works" API

Incremental Index Updates

Add, remove, and update vectors without rebuilding the entire index:

use oxify_vector::{VectorSearchIndex, SearchConfig};
use std::collections::HashMap;

let mut index = VectorSearchIndex::new(SearchConfig::default());

// Build initial index
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Add a single vector
index.add_vector("doc3".to_string(), vec![0.3, 0.4, 0.5])?;

// Add multiple vectors
let mut new_docs = HashMap::new();
new_docs.insert("doc4".to_string(), vec![0.4, 0.5, 0.6]);
new_docs.insert("doc5".to_string(), vec![0.5, 0.6, 0.7]);
index.add_vectors(&new_docs)?;

// Update an existing vector
index.update_vector("doc1", vec![0.9, 0.9, 0.9])?;

// Remove a vector
index.remove_vector("doc2")?;

Query Optimizer

Automatically select the best search strategy based on dataset size and requirements:

use oxify_vector::optimizer::{QueryOptimizer, OptimizerConfig, SearchStrategy};

let optimizer = QueryOptimizer::new(OptimizerConfig::default());

// Recommend strategy based on dataset size and required recall
let num_vectors = 100_000;
let required_recall = 0.95;
let strategy = optimizer.recommend_strategy(num_vectors, required_recall);

match strategy {
    SearchStrategy::BruteForce => println!("Use exact search (< 10K vectors)"),
    SearchStrategy::Hnsw => println!("Use HNSW (10K - 1M vectors)"),
    SearchStrategy::IvfPq => println!("Use IVF-PQ (> 1M vectors)"),
    SearchStrategy::Distributed => println!("Use distributed search (> 10M vectors)"),
}

// Optimize pre/post-filtering
let filter_selectivity = 0.05; // 5% of vectors match filter
let use_prefilter = optimizer.recommend_prefiltering(num_vectors, filter_selectivity);

// Optimize batch size
let batch_size = optimizer.recommend_batch_size(1000, num_vectors);

Presets for common scenarios:

use oxify_vector::OptimizerConfig;

// High accuracy (use exact search longer, higher recall threshold)
let config = OptimizerConfig::high_accuracy();

// High speed (switch to ANN earlier, lower recall threshold)
let config = OptimizerConfig::high_speed();

// Memory efficient (use quantization earlier, disable caching)
let config = OptimizerConfig::memory_efficient();

Scalar Quantization

Reduce memory usage by 75% with minimal accuracy loss:

use oxify_vector::quantization::{QuantizedVectorIndex, QuantizationConfig};

// Generate dataset
let vectors: Vec<(String, Vec<f32>)> = (0..10000)
    .map(|i| (format!("doc_{}", i), vec![/* ... */]))
    .collect();

// Build quantized index
let mut index = QuantizedVectorIndex::new(QuantizationConfig::default());
index.build(&vectors)?;

// Get statistics
let stats = index.stats();
println!("Original size: {} bytes", stats.original_bytes);
println!("Quantized size: {} bytes", stats.quantized_bytes);
println!("Compression: {:.2}x", stats.compression_ratio);
println!("Memory saved: {:.1}%", stats.memory_savings * 100.0);

// Search (automatically uses quantized distance)
let query = vec![0.5, 0.5, 0.5];
let results = index.search(&query, 10)?;

Benefits:

Memory: 4x reduction (float32 → uint8)
Speed: Faster distance computations with integer math
Accuracy: ~1-2% recall degradation for most datasets

Multi-Index Search

Search across multiple indexes in parallel and merge results:

use oxify_vector::{MultiIndexSearch, MultiIndexConfig, ScoreMergeStrategy};
use std::collections::HashMap;

// Create multiple indexes (e.g., different data shards or time periods)
let mut index1 = VectorSearchIndex::new(SearchConfig::default());
let mut index2 = VectorSearchIndex::new(SearchConfig::default());

// Build indexes...
index1.build(&embeddings1)?;
index2.build(&embeddings2)?;

// Configure multi-index search
let config = MultiIndexConfig {
    parallel: true,                           // Search indexes in parallel
    deduplicate: true,                        // Remove duplicate entity_ids
    merge_strategy: ScoreMergeStrategy::Max,  // Take max score for duplicates
};

let multi_search = MultiIndexSearch::with_config(config);

// Search across both indexes
let query = vec![0.5, 0.5, 0.5];
let results = multi_search.search(&[&index1, &index2], &query, 10)?;

Score merge strategies:

ScoreMergeStrategy::Max - Take highest score (recommended)
ScoreMergeStrategy::Min - Take lowest score
ScoreMergeStrategy::Average - Average scores
ScoreMergeStrategy::First - Take first occurrence

Parallel Search

Enable parallel processing for large datasets:

[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }

let mut config = SearchConfig::default();
config.parallel = true;  // Use Rayon for parallel search

let mut index = VectorSearchIndex::new(config);
index.build(&embeddings)?;

// Search uses all CPU cores
let results = index.search(&query, 10)?;

Performance:

Single-threaded: ~1ms for 1k vectors
Parallel (8 cores): ~0.2ms for 1k vectors
Speedup scales with core count

Integration with LLM Workflows

Axum API Endpoint

use oxify_vector::{VectorSearchIndex, SearchConfig};
use axum::{extract::State, http::StatusCode, Json};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use tokio::sync::RwLock;

#[derive(Clone)]
struct AppState {
    vector_index: Arc<RwLock<VectorSearchIndex>>,
}

#[derive(Deserialize)]
struct SearchRequest {
    query: Vec<f32>,
    k: usize,
}

#[derive(Serialize)]
struct SearchResponse {
    results: Vec<SearchResult>,
}

async fn search_handler(
    State(state): State<AppState>,
    Json(req): Json<SearchRequest>,
) -> Result<Json<SearchResponse>, StatusCode> {
    let index = state.vector_index.read().await;

    let results = index
        .search(&req.query, req.k)
        .map_err(|_| StatusCode::BAD_REQUEST)?;

    Ok(Json(SearchResponse { results }))
}

Dynamic Index Updates

use oxify_vector::VectorSearchIndex;
use tokio::sync::RwLock;
use std::sync::Arc;

async fn add_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: String,
    embedding: Vec<f32>,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.add(doc_id, embedding)?;
    Ok(())
}

async fn remove_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.remove(doc_id)?;
    Ok(())
}

Limitations

When to Use

Small to medium datasets: <100k vectors
Development and prototyping: Fast iteration without external dependencies
Exact search required: Guaranteed best results
Low latency: Sub-millisecond search

When NOT to Use

Large datasets: >100k vectors (use Qdrant, Pinecone, Weaviate instead)
Approximate search: If 99% accuracy is acceptable (use HNSW, IVF instead)
Persistence: Index is in-memory only (serialize/deserialize manually)
Distributed search: No multi-node support

Scaling to Production

For production RAG at scale, integrate with external vector databases:

// oxify-vector for development
#[cfg(debug_assertions)]
use oxify_vector::VectorSearchIndex;

// Qdrant for production
#[cfg(not(debug_assertions))]
use oxify_connect_vector::QdrantClient;

Or use oxify-connect-vector for Qdrant/pgvector:

use oxify_connect_vector::{VectorProvider, QdrantProvider};

let provider = QdrantProvider::new("http://localhost:6334").await?;
provider.search("collection_name", &query, 10).await?;

Performance Benchmarks

Hardware: Intel i7-12700K (12 cores), 32GB RAM

Vectors	Dimensions	Metric	Parallel	Time
1k	768	Cosine	No	1.2ms
1k	768	Cosine	Yes	0.3ms
10k	768	Cosine	No	12ms
10k	768	Cosine	Yes	2.5ms
100k	768	Cosine	No	120ms
100k	768	Cosine	Yes	25ms

Memory usage: ~4 bytes/dimension/vector (768D = 3KB per vector)

Testing

Run the test suite:

cd crates/engine/oxify-vector
cargo test

# Run with parallel feature
cargo test --features parallel

All tests pass with zero warnings.

Dependencies

Core dependencies:

rayon - Parallel processing (optional)

No external vector database required.

Migration from Other Libraries

From FAISS

# FAISS (Python)
import faiss
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
distances, indices = index.search(query, k)

// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig {
    metric: DistanceMetric::Euclidean,
    ..Default::default()
});
index.build(&embeddings)?;
let results = index.search(&query, k)?;

From ChromaDB

# ChromaDB (Python)
collection = client.create_collection("docs")
collection.add(documents=docs, embeddings=embeddings, ids=ids)
results = collection.query(query_embeddings=[query], n_results=k)

// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig::default());
index.build(&embeddings)?;
let results = index.search(&query, k)?;

Implemented Features

All core features are production-ready:

✅ Adaptive Index: Automatic performance optimization (NEW)
✅ Approximate search algorithms (HNSW, IVF-PQ)
✅ Serialization/deserialization for persistence
✅ Filtered search (metadata filtering)
✅ Hybrid search (vector + BM25 keyword)
✅ Batch search optimization
✅ Incremental index updates
✅ Query optimizer for automatic strategy selection
✅ Scalar quantization for memory efficiency
✅ Multi-index search
✅ Distributed search with sharding
✅ ColBERT multi-vector search
✅ SIMD optimizations (AVX2)
✅ OpenTelemetry tracing support

Future Enhancements

GPU acceleration (CUDA/ROCm)
Product Quantization (PQ) for extreme compression
Learned indexes (AI-optimized data structures)
Streaming index updates (real-time ingestion)

License

Apache-2.0

Attribution

Ported from OxiRS with permission. Original implementation by the OxiLabs team.

oxify-vector 0.1.0