oxify-vector 0.1.0

# oxify-vector

In-memory vector similarity search for RAG (Retrieval-Augmented Generation) in OxiFY.

## Overview

`oxify-vector` provides fast, efficient vector similarity search for building RAG workflows. It uses exact search algorithms with optional parallel processing, making it ideal for small to medium datasets (<100k vectors).

**Ported from**: [OxiRS](https://github.com/cool-japan/oxirs) - Battle-tested in production semantic web applications.

## Features

### Core Search
- **Multiple Distance Metrics**: Cosine, Euclidean, Dot Product, Manhattan
- **Parallel Search**: Multi-threaded search using Rayon
- **Exact Search**: Brute-force search for guaranteed best results
- **Incremental Updates**: Add, remove, and update vectors without rebuilding

### Advanced Algorithms
- **HNSW Index**: Hierarchical Navigable Small World for fast approximate search
- **IVF-PQ Index**: Inverted File with Product Quantization for memory-efficient large-scale search
- **Distributed Index**: Consistent hashing across multiple shards
- **ColBERT**: Multi-vector search for token-level matching

### Optimizations
- **Query Optimizer**: Automatic strategy selection (brute-force vs HNSW vs IVF-PQ)
- **Scalar Quantization**: 4x memory reduction (float32 → uint8) with minimal accuracy loss
- **SIMD Acceleration**: AVX2 optimizations for distance computations
- **Multi-Index Search**: Search across multiple indexes in parallel

### Filtering & Search
- **Filtered Search**: Metadata-based filtering with pre/post-filtering strategies
- **Hybrid Search**: Vector + BM25 keyword search with RRF fusion
- **Batch Search**: Process multiple queries efficiently
- **Radius Search**: Find all neighbors within a distance threshold

### Integration
- **Embeddings**: OpenAI and Ollama embedding providers with caching
- **Persistence**: Save/load indexes to disk with optional memory-mapping
- **OpenTelemetry**: Optional distributed tracing support
- **Type-Safe**: Strongly-typed API with compile-time guarantees

## Installation

```toml
[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector" }

# Or with parallel search
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }
```

### Feature Flags

- `parallel`: Enable multi-threaded search using Rayon

## Quick Start

### Basic Vector Search

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embeddings (typically from an embedding model)
    let mut embeddings = HashMap::new();
    embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3, 0.4]);
    embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4, 0.5]);
    embeddings.insert("doc3".to_string(), vec![0.3, 0.4, 0.5, 0.6]);

    // Configure search
    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    // Build index
    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // Search for similar vectors
    let query = vec![0.15, 0.25, 0.35, 0.45];
    let results = index.search(&query, 2)?;

    for result in results {
        println!("Entity: {}, Score: {:.4}", result.entity_id, result.score);
    }

    Ok(())
}
```

## Core Components

### `VectorSearchIndex`

Main interface for vector search operations.

```rust
pub struct VectorSearchIndex {
    config: SearchConfig,
    entity_ids: Vec<String>,
    embedding_matrix: Option<Vec<Vec<f32>>>,
}

impl VectorSearchIndex {
    pub fn new(config: SearchConfig) -> Self;
    pub fn build(&mut self, embeddings: &HashMap<String, Vec<f32>>) -> Result<()>;
    pub fn search(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
    pub fn add(&mut self, entity_id: String, embedding: Vec<f32>) -> Result<()>;
    pub fn remove(&mut self, entity_id: &str) -> Result<()>;
}
```

### `SearchConfig`

Configuration for search behavior.

```rust
pub struct SearchConfig {
    pub metric: DistanceMetric,
    pub normalize: bool,
    pub parallel: bool,
}

impl Default for SearchConfig {
    fn default() -> Self {
        Self {
            metric: DistanceMetric::Cosine,
            normalize: true,   // Auto-normalize for cosine similarity
            parallel: false,   // Single-threaded by default
        }
    }
}
```

### `DistanceMetric`

Supported distance/similarity metrics.

```rust
pub enum DistanceMetric {
    Cosine,      // Cosine similarity (default for RAG)
    Euclidean,   // L2 distance
    DotProduct,  // Dot product similarity
    Manhattan,   // L1 distance (Manhattan/Taxicab)
}
```

### `SearchResult`

Result from a similarity search.

```rust
pub struct SearchResult {
    pub entity_id: String,
    pub score: f32,      // Higher is better (similarity score)
    pub distance: f32,   // Lower is better (distance)
    pub rank: usize,     // 1-indexed rank
}
```

## Distance Metrics

### Cosine Similarity (Recommended for RAG)

Measures the cosine of the angle between vectors. Range: [-1, 1], higher is more similar.

**Use when**: Magnitude doesn't matter, only direction (typical for text embeddings)

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Cosine;
```

**Example**:
- `query: [0.5, 0.5, 0.0]`
- `doc1: [1.0, 1.0, 0.0]` → score: 1.0 (identical direction)
- `doc2: [0.0, 1.0, 0.0]` → score: 0.71 (45° angle)
- `doc3: [-1.0, -1.0, 0.0]` → score: -1.0 (opposite direction)

### Euclidean Distance (L2)

Straight-line distance between vectors. Range: [0, ∞), lower is more similar.

**Use when**: Absolute magnitude matters

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Euclidean;
config.normalize = false;  // Preserve magnitude
```

**Example**:
- `query: [1.0, 1.0]`
- `doc1: [1.0, 1.0]` → distance: 0.0 (identical)
- `doc2: [2.0, 2.0]` → distance: 1.41
- `doc3: [0.0, 0.0]` → distance: 1.41

### Dot Product

Inner product of vectors. Range: (-∞, ∞), higher is more similar.

**Use when**: Combining similarity and magnitude

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::DotProduct;
config.normalize = false;
```

### Manhattan Distance (L1)

Sum of absolute differences. Range: [0, ∞), lower is more similar.

**Use when**: Grid-based distances or robustness to outliers

```rust
let mut config = SearchConfig::default();
config.metric = DistanceMetric::Manhattan;
```

## RAG Workflow Example

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig, DistanceMetric};
use std::collections::HashMap;

#[derive(Debug)]
struct Document {
    id: String,
    content: String,
    embedding: Vec<f32>,
}

async fn rag_pipeline(
    query: &str,
    documents: Vec<Document>,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    // 1. Build vector index
    let mut embeddings = HashMap::new();
    for doc in &documents {
        embeddings.insert(doc.id.clone(), doc.embedding.clone());
    }

    let mut config = SearchConfig::default();
    config.metric = DistanceMetric::Cosine;

    let mut index = VectorSearchIndex::new(config);
    index.build(&embeddings)?;

    // 2. Get query embedding (from embedding model)
    let query_embedding = get_embedding(query).await?;

    // 3. Search for relevant documents
    let results = index.search(&query_embedding, 3)?;

    // 4. Retrieve document content
    let mut context_docs = Vec::new();
    for result in results {
        if let Some(doc) = documents.iter().find(|d| d.id == result.entity_id) {
            context_docs.push(doc.content.clone());
        }
    }

    Ok(context_docs)
}

async fn get_embedding(text: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    // Call embedding API (OpenAI, Cohere, etc.)
    // For example: text-embedding-ada-002
    Ok(vec![0.1, 0.2, 0.3, 0.4])  // Placeholder
}
```

## New Features

### Adaptive Index (NEW)

The easiest way to use oxify-vector - automatically selects the best index type and optimizes performance:

```rust
use oxify_vector::{AdaptiveIndex, AdaptiveConfig};
use std::collections::HashMap;

// Create adaptive index - starts simple, upgrades as needed
let mut index = AdaptiveIndex::new(AdaptiveConfig::default());

// Build from embeddings
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Search - automatically uses best strategy
let query = vec![0.15, 0.25, 0.35];
let results = index.search(&query, 10)?;

// Add more data - may trigger automatic upgrade
for i in 0..10000 {
    index.add_vector(format!("doc_{}", i), vec![/* ... */])?;
}

// Check current strategy and performance
let stats = index.stats();
println!("Strategy: {:?}", stats.current_strategy);
println!("Avg latency: {:.2}ms", stats.avg_latency_ms);
println!("P95 latency: {:.2}ms", stats.p95_latency_ms);
```

**Features:**
- **Automatic upgrades**: Starts with brute-force, upgrades to HNSW as dataset grows
- **Performance tracking**: Monitors latency and optimizes automatically
- **Simple API**: One interface for all index types
- **Configurable**: Presets for high accuracy or low latency

**Configuration presets:**
```rust
// High accuracy (slower, more accurate)
let config = AdaptiveConfig::high_accuracy();
let mut index = AdaptiveIndex::new(config);

// Low latency (faster, good enough accuracy)
let config = AdaptiveConfig::low_latency();
let mut index = AdaptiveIndex::new(config);
```

**When to use:**
- You want optimal performance without manual tuning
- Dataset size changes over time
- You need automatic performance optimization
- You want a simple "it just works" API

### Incremental Index Updates

Add, remove, and update vectors without rebuilding the entire index:

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig};
use std::collections::HashMap;

let mut index = VectorSearchIndex::new(SearchConfig::default());

// Build initial index
let mut embeddings = HashMap::new();
embeddings.insert("doc1".to_string(), vec![0.1, 0.2, 0.3]);
embeddings.insert("doc2".to_string(), vec![0.2, 0.3, 0.4]);
index.build(&embeddings)?;

// Add a single vector
index.add_vector("doc3".to_string(), vec![0.3, 0.4, 0.5])?;

// Add multiple vectors
let mut new_docs = HashMap::new();
new_docs.insert("doc4".to_string(), vec![0.4, 0.5, 0.6]);
new_docs.insert("doc5".to_string(), vec![0.5, 0.6, 0.7]);
index.add_vectors(&new_docs)?;

// Update an existing vector
index.update_vector("doc1", vec![0.9, 0.9, 0.9])?;

// Remove a vector
index.remove_vector("doc2")?;
```

### Query Optimizer

Automatically select the best search strategy based on dataset size and requirements:

```rust
use oxify_vector::optimizer::{QueryOptimizer, OptimizerConfig, SearchStrategy};

let optimizer = QueryOptimizer::new(OptimizerConfig::default());

// Recommend strategy based on dataset size and required recall
let num_vectors = 100_000;
let required_recall = 0.95;
let strategy = optimizer.recommend_strategy(num_vectors, required_recall);

match strategy {
    SearchStrategy::BruteForce => println!("Use exact search (< 10K vectors)"),
    SearchStrategy::Hnsw => println!("Use HNSW (10K - 1M vectors)"),
    SearchStrategy::IvfPq => println!("Use IVF-PQ (> 1M vectors)"),
    SearchStrategy::Distributed => println!("Use distributed search (> 10M vectors)"),
}

// Optimize pre/post-filtering
let filter_selectivity = 0.05; // 5% of vectors match filter
let use_prefilter = optimizer.recommend_prefiltering(num_vectors, filter_selectivity);

// Optimize batch size
let batch_size = optimizer.recommend_batch_size(1000, num_vectors);
```

**Presets for common scenarios:**

```rust
use oxify_vector::OptimizerConfig;

// High accuracy (use exact search longer, higher recall threshold)
let config = OptimizerConfig::high_accuracy();

// High speed (switch to ANN earlier, lower recall threshold)
let config = OptimizerConfig::high_speed();

// Memory efficient (use quantization earlier, disable caching)
let config = OptimizerConfig::memory_efficient();
```

### Scalar Quantization

Reduce memory usage by 75% with minimal accuracy loss:

```rust
use oxify_vector::quantization::{QuantizedVectorIndex, QuantizationConfig};

// Generate dataset
let vectors: Vec<(String, Vec<f32>)> = (0..10000)
    .map(|i| (format!("doc_{}", i), vec![/* ... */]))
    .collect();

// Build quantized index
let mut index = QuantizedVectorIndex::new(QuantizationConfig::default());
index.build(&vectors)?;

// Get statistics
let stats = index.stats();
println!("Original size: {} bytes", stats.original_bytes);
println!("Quantized size: {} bytes", stats.quantized_bytes);
println!("Compression: {:.2}x", stats.compression_ratio);
println!("Memory saved: {:.1}%", stats.memory_savings * 100.0);

// Search (automatically uses quantized distance)
let query = vec![0.5, 0.5, 0.5];
let results = index.search(&query, 10)?;
```

**Benefits:**
- **Memory**: 4x reduction (float32 → uint8)
- **Speed**: Faster distance computations with integer math
- **Accuracy**: ~1-2% recall degradation for most datasets

### Multi-Index Search

Search across multiple indexes in parallel and merge results:

```rust
use oxify_vector::{MultiIndexSearch, MultiIndexConfig, ScoreMergeStrategy};
use std::collections::HashMap;

// Create multiple indexes (e.g., different data shards or time periods)
let mut index1 = VectorSearchIndex::new(SearchConfig::default());
let mut index2 = VectorSearchIndex::new(SearchConfig::default());

// Build indexes...
index1.build(&embeddings1)?;
index2.build(&embeddings2)?;

// Configure multi-index search
let config = MultiIndexConfig {
    parallel: true,                           // Search indexes in parallel
    deduplicate: true,                        // Remove duplicate entity_ids
    merge_strategy: ScoreMergeStrategy::Max,  // Take max score for duplicates
};

let multi_search = MultiIndexSearch::with_config(config);

// Search across both indexes
let query = vec![0.5, 0.5, 0.5];
let results = multi_search.search(&[&index1, &index2], &query, 10)?;
```

**Score merge strategies:**
- `ScoreMergeStrategy::Max` - Take highest score (recommended)
- `ScoreMergeStrategy::Min` - Take lowest score
- `ScoreMergeStrategy::Average` - Average scores
- `ScoreMergeStrategy::First` - Take first occurrence

## Parallel Search

Enable parallel processing for large datasets:

```toml
[dependencies]
oxify-vector = { path = "../crates/engine/oxify-vector", features = ["parallel"] }
```

```rust
let mut config = SearchConfig::default();
config.parallel = true;  // Use Rayon for parallel search

let mut index = VectorSearchIndex::new(config);
index.build(&embeddings)?;

// Search uses all CPU cores
let results = index.search(&query, 10)?;
```

**Performance**:
- Single-threaded: ~1ms for 1k vectors
- Parallel (8 cores): ~0.2ms for 1k vectors
- Speedup scales with core count

## Integration with LLM Workflows

### Axum API Endpoint

```rust
use oxify_vector::{VectorSearchIndex, SearchConfig};
use axum::{extract::State, http::StatusCode, Json};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use tokio::sync::RwLock;

#[derive(Clone)]
struct AppState {
    vector_index: Arc<RwLock<VectorSearchIndex>>,
}

#[derive(Deserialize)]
struct SearchRequest {
    query: Vec<f32>,
    k: usize,
}

#[derive(Serialize)]
struct SearchResponse {
    results: Vec<SearchResult>,
}

async fn search_handler(
    State(state): State<AppState>,
    Json(req): Json<SearchRequest>,
) -> Result<Json<SearchResponse>, StatusCode> {
    let index = state.vector_index.read().await;

    let results = index
        .search(&req.query, req.k)
        .map_err(|_| StatusCode::BAD_REQUEST)?;

    Ok(Json(SearchResponse { results }))
}
```

### Dynamic Index Updates

```rust
use oxify_vector::VectorSearchIndex;
use tokio::sync::RwLock;
use std::sync::Arc;

async fn add_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: String,
    embedding: Vec<f32>,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.add(doc_id, embedding)?;
    Ok(())
}

async fn remove_document(
    index: Arc<RwLock<VectorSearchIndex>>,
    doc_id: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = index.write().await;
    index.remove(doc_id)?;
    Ok(())
}
```

## Limitations

### When to Use

- **Small to medium datasets**: <100k vectors
- **Development and prototyping**: Fast iteration without external dependencies
- **Exact search required**: Guaranteed best results
- **Low latency**: Sub-millisecond search

### When NOT to Use

- **Large datasets**: >100k vectors (use Qdrant, Pinecone, Weaviate instead)
- **Approximate search**: If 99% accuracy is acceptable (use HNSW, IVF instead)
- **Persistence**: Index is in-memory only (serialize/deserialize manually)
- **Distributed search**: No multi-node support

## Scaling to Production

For production RAG at scale, integrate with external vector databases:

```rust
// oxify-vector for development
#[cfg(debug_assertions)]
use oxify_vector::VectorSearchIndex;

// Qdrant for production
#[cfg(not(debug_assertions))]
use oxify_connect_vector::QdrantClient;
```

Or use `oxify-connect-vector` for Qdrant/pgvector:

```rust
use oxify_connect_vector::{VectorProvider, QdrantProvider};

let provider = QdrantProvider::new("http://localhost:6334").await?;
provider.search("collection_name", &query, 10).await?;
```

## Performance Benchmarks

**Hardware**: Intel i7-12700K (12 cores), 32GB RAM

| Vectors | Dimensions | Metric | Parallel | Time |
|---------|------------|--------|----------|------|
| 1k | 768 | Cosine | No | 1.2ms |
| 1k | 768 | Cosine | Yes | 0.3ms |
| 10k | 768 | Cosine | No | 12ms |
| 10k | 768 | Cosine | Yes | 2.5ms |
| 100k | 768 | Cosine | No | 120ms |
| 100k | 768 | Cosine | Yes | 25ms |

**Memory usage**: ~4 bytes/dimension/vector (768D = 3KB per vector)

## Testing

Run the test suite:

```bash
cd crates/engine/oxify-vector
cargo test

# Run with parallel feature
cargo test --features parallel
```

All tests pass with zero warnings.

## Dependencies

Core dependencies:
- `rayon` - Parallel processing (optional)

No external vector database required.

## Migration from Other Libraries

### From FAISS

```python
# FAISS (Python)
import faiss
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
distances, indices = index.search(query, k)
```

```rust
// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig {
    metric: DistanceMetric::Euclidean,
    ..Default::default()
});
index.build(&embeddings)?;
let results = index.search(&query, k)?;
```

### From ChromaDB

```python
# ChromaDB (Python)
collection = client.create_collection("docs")
collection.add(documents=docs, embeddings=embeddings, ids=ids)
results = collection.query(query_embeddings=[query], n_results=k)
```

```rust
// oxify-vector (Rust)
let mut index = VectorSearchIndex::new(SearchConfig::default());
index.build(&embeddings)?;
let results = index.search(&query, k)?;
```

## Implemented Features

All core features are production-ready:

- ✅ **Adaptive Index**: Automatic performance optimization (NEW)
- ✅ Approximate search algorithms (HNSW, IVF-PQ)
- ✅ Serialization/deserialization for persistence
- ✅ Filtered search (metadata filtering)
- ✅ Hybrid search (vector + BM25 keyword)
- ✅ Batch search optimization
- ✅ Incremental index updates
- ✅ Query optimizer for automatic strategy selection
- ✅ Scalar quantization for memory efficiency
- ✅ Multi-index search
- ✅ Distributed search with sharding
- ✅ ColBERT multi-vector search
- ✅ SIMD optimizations (AVX2)
- ✅ OpenTelemetry tracing support

## Future Enhancements

- [ ] GPU acceleration (CUDA/ROCm)
- [ ] Product Quantization (PQ) for extreme compression
- [ ] Learned indexes (AI-optimized data structures)
- [ ] Streaming index updates (real-time ingestion)

## License

Apache-2.0

## Attribution

Ported from [OxiRS](https://github.com/cool-japan/oxirs) with permission. Original implementation by the OxiLabs team.