# OxiRS Vec - Production Deployment Guide
**Version**: v0.2.3
**Last Updated**: 2026-03-05
## Table of Contents
1. [Introduction](#introduction)
2. [Architecture Overview](#architecture-overview)
3. [System Requirements](#system-requirements)
4. [Installation](#installation)
5. [Configuration](#configuration)
6. [Deployment Patterns](#deployment-patterns)
7. [Monitoring & Observability](#monitoring--observability)
8. [Performance Optimization](#performance-optimization)
9. [Security Best Practices](#security-best-practices)
10. [Disaster Recovery](#disaster-recovery)
11. [Troubleshooting](#troubleshooting)
12. [Migration from Other Systems](#migration-from-other-systems)
---
## Introduction
OxiRS Vec is a production-ready vector search engine designed for semantic similarity and AI-augmented querying. This guide covers everything you need to deploy and operate OxiRS Vec in production environments.
### Key Features
- **667 tests passing** with 100% pass rate
- **Production-grade features**: Real-time updates, crash recovery, multi-tenancy
- **Advanced indexing**: HNSW, IVF, PQ/OPQ, LSH, DiskANN, Learned Indexes
- **20+ distance metrics**: Cosine, Euclidean, KL-divergence, Pearson, and more
- **GPU acceleration**: CUDA kernels for 16 distance metrics
- **Hybrid search**: Keyword + semantic search with BM25
- **SPARQL integration**: Custom functions and federated queries
---
## Architecture Overview
### Component Architecture
```
┌─────────────────────────────────────────────────────┐
│ Application Layer │
│ (SPARQL Queries, REST API, GraphQL, Python Bindings)│
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ OxiRS Vec Core │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │Query Planning│ │Index Selection│ │Result Fusion│ │
│ └─────────────┘ └──────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Index Layer │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────────┐ │
│ │ HNSW │ │ IVF │ │ LSH │ │ PQ │ │ DiskANN │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────────┘ │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Storage Layer │
│ ┌──────────┐ ┌──────────┐ ┌─────────────────┐ │
│ │ WAL │ │Persistence│ │Multi-Tenancy │ │
│ │(Crash │ │(Zstd │ │(Isolation & │ │
│ │Recovery) │ │Compression)│ │Quotas) │ │
│ └──────────┘ └──────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────┘
```
### Data Flow
1. **Ingestion**: Documents → Embeddings → Index
2. **Query**: Query → Query Planning → Index Selection → Search → Re-ranking → Results
3. **Updates**: Real-time updates with priority queues
4. **Persistence**: WAL → Checkpoint → Compression → Storage
---
## System Requirements
### Minimum Requirements
| CPU | 4 cores (x86_64 or ARM64) |
| RAM | 8 GB |
| Disk | 50 GB SSD |
| OS | Linux (Ubuntu 20.04+), macOS (10.15+), Windows Server 2019+ |
### Recommended Requirements
| CPU | 16+ cores (x86_64 with AVX2/AVX-512) |
| RAM | 64 GB+ |
| Disk | 500 GB NVMe SSD |
| GPU | NVIDIA GPU with CUDA 12.0+ (optional, for acceleration) |
| Network | 10 Gbps+ |
| OS | Linux (Ubuntu 22.04+) |
### Capacity Planning
**Vector Storage (per million vectors)**:
- Dimensions: 384 (common for sentence embeddings)
- Raw storage: ~1.5 GB (384 × 4 bytes × 1M)
- With compression: ~600 MB - 1 GB
- HNSW index overhead: ~2-3× raw storage
- Total: ~3-5 GB per million vectors
**Memory Requirements**:
- Base: 2 GB
- Per million vectors: 4-6 GB (HNSW in-memory)
- Query cache: 1-2 GB
- Working memory: 2-4 GB
- Total (10M vectors): 40-60 GB
---
## Installation
### From Source
```bash
# Clone repository
git clone https://github.com/cool-japan/oxirs.git
cd oxirs/engine/oxirs-vec
# Build with default features
cargo build --release
# Build with all features (requires CUDA for GPU support)
cargo build --release --features hnsw,simd,parallel,gpu,blas
# Build with GPU acceleration (requires CUDA toolkit)
cargo build --release --features gpu-full
```
### Feature Flags
| `hnsw` | HNSW index support | ✅ Yes |
| `simd` | SIMD acceleration | ✅ Yes |
| `parallel` | Parallel processing | ✅ Yes |
| `gpu` | GPU acceleration (basic) | ✅ Yes |
| `cuda` | CUDA acceleration | ⚠️ Requires CUDA |
| `blas` | BLAS support | ✅ Yes |
| `candle-gpu` | Candle GPU backend | ✅ Yes |
| `content-processing` | PDF/Office document processing | ✅ Yes |
| `python` | Python bindings | ✅ Yes |
| `huggingface` | HuggingFace integration | ✅ Yes |
---
## Configuration
### Basic Configuration
```rust
use oxirs_vec::{VectorStore, VectorStoreConfig, index::{IndexConfig, IndexType}};
// Create configuration
let config = VectorStoreConfig {
auto_embed: true,
cache_embeddings: true,
similarity_threshold: 0.7,
max_results: 100,
};
// Initialize store
let mut store = VectorStore::new().with_config(config);
```
### HNSW Configuration
```rust
use oxirs_vec::hnsw::{HnswConfig, HnswIndex};
let config = HnswConfig {
m: 16, // Number of connections per layer
ef_construction: 200, // Size of dynamic candidate list (construction)
ef_search: 100, // Size of dynamic candidate list (search)
max_elements: 1_000_000, // Maximum number of elements
dimensions: 384, // Vector dimensions
};
let index = HnswIndex::new(config)?;
```
### Performance Tuning Parameters
#### HNSW Parameters
| `m` | Connections per layer | 16 | Higher = more accurate, more memory (8-64) |
| `ef_construction` | Construction quality | 200 | Higher = better index quality (100-500) |
| `ef_search` | Search quality | 100 | Higher = more accurate, slower (50-500) |
**Tuning Rules**:
- High recall (>0.95): `m=32`, `ef_construction=400`, `ef_search=200`
- Balanced: `m=16`, `ef_construction=200`, `ef_search=100` (default)
- Fast search: `m=8`, `ef_construction=100`, `ef_search=50`
#### IVF Parameters
```rust
use oxirs_vec::ivf::{IvfConfig, IvfIndex};
let config = IvfConfig {
num_clusters: 256, // Number of clusters (sqrt(N) to N/100)
num_probes: 10, // Clusters to search (1-100)
dimensions: 384,
max_elements: 1_000_000,
};
```
---
## Deployment Patterns
### Pattern 1: Single-Node Deployment
**Use Case**: Small to medium workloads (< 10M vectors)
```rust
use oxirs_vec::{VectorStore, hnsw::HnswConfig};
fn create_single_node_store() -> anyhow::Result<VectorStore> {
let hnsw_config = HnswConfig {
m: 16,
ef_construction: 200,
ef_search: 100,
max_elements: 10_000_000,
dimensions: 384,
};
let index = Box::new(oxirs_vec::hnsw::HnswIndex::new(hnsw_config)?);
Ok(VectorStore::with_index(index))
}
```
**Architecture**:
```
┌─────────────────────────────────────┐
│ Application Server │
│ ┌──────────────────────────────┐ │
│ │ OxiRS Vec Instance │ │
│ │ - HNSW Index │ │
│ │ - WAL │ │
│ │ - Persistence Layer │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────┘
```
**Pros**: Simple, low latency, easy maintenance
**Cons**: Limited scalability, single point of failure
### Pattern 2: Multi-Tenant Deployment
**Use Case**: SaaS applications with tenant isolation
```rust
use oxirs_vec::multi_tenancy::{
MultiTenantManager, TenantManagerConfig, TenantConfig, ResourceQuota
};
fn setup_multi_tenant() -> anyhow::Result<MultiTenantManager> {
let config = TenantManagerConfig {
max_tenants: 1000,
default_quota: ResourceQuota {
max_vectors: 100_000,
max_storage_bytes: 1_000_000_000, // 1 GB
max_queries_per_second: 100.0,
max_index_size_bytes: 500_000_000, // 500 MB
},
isolation_level: oxirs_vec::multi_tenancy::IsolationLevel::Strong,
};
MultiTenantManager::new(config)
}
```
**Architecture**:
```
┌─────────────────────────────────────────────────┐
│ Multi-Tenant Manager │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tenant A │ │ Tenant B │ │ Tenant C │ ... │
│ │ - Index │ │ - Index │ │ - Index │ │
│ │ - Quota │ │ - Quota │ │ - Quota │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
```
**Pros**: Strong isolation, quota management, billing support
**Cons**: Higher memory usage, more complex
### Pattern 3: Distributed Deployment
**Use Case**: Large-scale workloads (> 100M vectors)
```rust
use oxirs_vec::distributed_vector_search::{
DistributedVectorSearch, DistributedNodeConfig, PartitioningStrategy
};
fn setup_distributed() -> anyhow::Result<DistributedVectorSearch> {
let config = DistributedNodeConfig {
node_id: "node-1".to_string(),
listen_addr: "0.0.0.0:8080".to_string(),
peer_nodes: vec![
"node-2:8080".to_string(),
"node-3:8080".to_string(),
],
partitioning_strategy: PartitioningStrategy::HashBased,
};
DistributedVectorSearch::new(config)
}
```
**Architecture**:
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ Vectors │ │ Vectors │ │ Vectors │
│ 0-33M │ │ 34M-66M │ │ 67M-100M │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└─────────────────┴─────────────────┘
│
┌─────────────────────┐
│ Load Balancer │
│ Query Coordinator │
└─────────────────────┘
```
**Pros**: Horizontal scalability, fault tolerance
**Cons**: Network latency, consistency challenges
---
## Monitoring & Observability
### Metrics Collection
```rust
use oxirs_vec::enhanced_performance_monitoring::{
EnhancedPerformanceMonitor, MonitoringConfig, AlertThresholds
};
fn setup_monitoring() -> anyhow::Result<EnhancedPerformanceMonitor> {
let config = MonitoringConfig {
enable_metrics: true,
enable_tracing: true,
alert_thresholds: AlertThresholds {
max_query_latency_ms: 100.0,
max_memory_usage_percent: 85.0,
min_recall: 0.90,
},
};
EnhancedPerformanceMonitor::new(config)
}
```
### Key Metrics
| Query Latency (p50) | Median query time | < 10 ms |
| Query Latency (p95) | 95th percentile | < 50 ms |
| Query Latency (p99) | 99th percentile | < 100 ms |
| Throughput | Queries per second | 1000+ |
| Recall@10 | Top-10 recall | > 0.95 |
| Memory Usage | RAM utilization | < 80% |
| Index Build Time | Time to build index | < 1 hr per 10M |
### Health Checks
```rust
use oxirs_vec::real_time_analytics::PerformanceMonitor;
fn health_check(monitor: &PerformanceMonitor) -> bool {
let stats = monitor.get_statistics();
// Check query latency
if stats.avg_query_latency_ms > 100.0 {
return false;
}
// Check memory usage
if stats.memory_usage_percent > 90.0 {
return false;
}
// Check error rate
if stats.error_rate > 0.01 {
return false;
}
true
}
```
---
## Performance Optimization
### Index Selection Strategy
```rust
use oxirs_vec::query_planning::{QueryPlanner, QueryCharacteristics};
fn optimize_query(planner: &mut QueryPlanner, query: &Vector) -> anyhow::Result<()> {
let characteristics = QueryCharacteristics {
vector_dimensions: query.len(),
expected_result_size: 10,
query_vector_sparsity: calculate_sparsity(query),
use_filters: false,
};
let plan = planner.plan_query(&characteristics)?;
// Execute with optimal strategy
Ok(())
}
fn calculate_sparsity(vector: &Vector) -> f32 {
let values = vector.as_f32();
let zeros = values.iter().filter(|&&x| x == 0.0).count();
zeros as f32 / values.len() as f32
}
```
### Cache Configuration
```rust
use oxirs_vec::advanced_caching::{CacheConfig, MultiLevelCache};
fn setup_caching() -> anyhow::Result<MultiLevelCache> {
let config = CacheConfig {
l1_capacity: 1000, // Hot queries
l2_capacity: 10000, // Warm queries
l3_capacity: 100000, // Cold queries
ttl_seconds: 3600, // 1 hour
};
MultiLevelCache::new(config)
}
```
### Memory Optimization
**Technique 1: Quantization**
```rust
use oxirs_vec::sq::{SqConfig, SqIndex, QuantizationMode};
// Reduce memory by 4x with Scalar Quantization
let sq_config = SqConfig {
dimensions: 384,
quantization_mode: QuantizationMode::Uniform,
bits: 8, // 8-bit quantization
};
let sq_index = SqIndex::new(sq_config)?;
// Memory: 384 bytes per vector → 96 bytes per vector
```
**Technique 2: Product Quantization**
```rust
use oxirs_vec::pq::{PQConfig, PQIndex};
// Reduce memory by 16x with Product Quantization
let pq_config = PQConfig {
dimensions: 384,
num_subvectors: 48, // 384 / 8 = 48
num_centroids: 256, // 8-bit per subvector
max_elements: 1_000_000,
};
let pq_index = PQIndex::new(pq_config)?;
// Memory: 384 bytes per vector → 48 bytes per vector
```
**Technique 3: Disk-backed Storage**
```rust
use oxirs_vec::diskann::{DiskAnnConfig, DiskAnnIndex};
// Billion-scale vectors with minimal memory
let diskann_config = DiskAnnConfig {
dimensions: 384,
max_degree: 64,
search_list_size: 100,
index_path: "/mnt/nvme/vectors".to_string(),
};
let diskann_index = DiskAnnIndex::new(diskann_config)?;
// Memory: Only active pages loaded (~100 MB for 1B vectors)
```
---
## Security Best Practices
### 1. Authentication & Authorization
```rust
use oxirs_vec::multi_tenancy::{AccessControl, Role, Permission};
fn setup_access_control() -> anyhow::Result<AccessControl> {
let mut acl = AccessControl::new();
// Define roles
acl.add_role(Role {
name: "admin".to_string(),
permissions: vec![
Permission::Read,
Permission::Write,
Permission::Delete,
Permission::Admin,
],
})?;
acl.add_role(Role {
name: "read_only".to_string(),
permissions: vec![Permission::Read],
})?;
Ok(acl)
}
```
### 2. Rate Limiting
```rust
use oxirs_vec::multi_tenancy::RateLimiter;
fn setup_rate_limiting() -> anyhow::Result<RateLimiter> {
RateLimiter::new(
100.0, // 100 queries per second
1000, // Burst capacity
)
}
```
### 3. Input Validation
```rust
use oxirs_vec::validation::{VectorValidator, ValidationConfig};
fn validate_input(vector: &Vector) -> anyhow::Result<()> {
let validator = VectorValidator::new(ValidationConfig {
min_dimensions: 1,
max_dimensions: 2048,
allow_nan: false,
allow_inf: false,
max_magnitude: 1000.0,
});
validator.validate(vector)?;
Ok(())
}
```
### 4. Encryption at Rest
```rust
use oxirs_vec::persistence::{PersistenceConfig, EncryptionConfig};
fn setup_encryption() -> PersistenceConfig {
PersistenceConfig {
encryption: Some(EncryptionConfig {
algorithm: "AES-256-GCM".to_string(),
key_path: "/etc/oxirs/encryption.key".to_string(),
}),
..Default::default()
}
}
```
---
## Disaster Recovery
### Write-Ahead Logging (WAL)
```rust
use oxirs_vec::wal::{WalConfig, WalManager};
fn setup_wal() -> anyhow::Result<WalManager> {
let config = WalConfig {
wal_dir: "/var/lib/oxirs/wal".to_string(),
max_wal_size: 1_000_000_000, // 1 GB
sync_interval_ms: 1000, // Sync every 1 second
checkpoint_interval: 10000, // Checkpoint every 10k ops
};
WalManager::new(config)
}
```
### Backup Strategy
**Incremental Backups**:
```rust
use oxirs_vec::persistence::{PersistenceManager, BackupConfig};
fn create_backup(manager: &PersistenceManager) -> anyhow::Result<()> {
let config = BackupConfig {
backup_dir: "/backups/oxirs".to_string(),
compression: true,
incremental: true,
};
manager.create_backup(&config)?;
Ok(())
}
```
**Recovery Procedure**:
```rust
use oxirs_vec::crash_recovery::{CrashRecoveryManager, RecoveryConfig};
fn recover_from_crash() -> anyhow::Result<VectorStore> {
let recovery_config = RecoveryConfig {
wal_dir: "/var/lib/oxirs/wal".to_string(),
data_dir: "/var/lib/oxirs/data".to_string(),
verify_checksums: true,
};
let manager = CrashRecoveryManager::new(recovery_config)?;
let store = manager.recover()?;
Ok(store)
}
```
---
## Troubleshooting
### Common Issues
#### Issue 1: High Query Latency
**Symptoms**: Queries taking > 100ms
**Diagnosis**:
```rust
let stats = monitor.get_query_statistics();
println!("Average latency: {} ms", stats.avg_latency);
println!("P95 latency: {} ms", stats.p95_latency);
println!("P99 latency: {} ms", stats.p99_latency);
```
**Solutions**:
1. Increase `ef_search` for better recall
2. Use query result caching
3. Enable GPU acceleration
4. Consider index quantization
#### Issue 2: High Memory Usage
**Symptoms**: Memory usage > 90%
**Diagnosis**:
```rust
let stats = monitor.get_system_statistics();
println!("Memory usage: {} %", stats.memory_usage_percent);
println!("Index size: {} GB", stats.index_size_gb);
```
**Solutions**:
1. Use Product Quantization (PQ)
2. Enable DiskANN for disk-backed storage
3. Reduce cache sizes
4. Implement tiering (hot/warm/cold)
#### Issue 3: Low Recall
**Symptoms**: Recall < 0.90
**Diagnosis**:
```rust
let quality_stats = monitor.get_quality_statistics();
println!("Recall@10: {}", quality_stats.recall_at_10);
println!("Recall@100: {}", quality_stats.recall_at_100);
```
**Solutions**:
1. Increase HNSW `m` parameter
2. Increase `ef_construction` during index build
3. Use higher `ef_search` during queries
4. Consider exact search for critical queries
---
## Migration from Other Systems
### From FAISS
```rust
use oxirs_vec::faiss_migration_tools::{FaissConverter, ConversionConfig};
fn migrate_from_faiss(faiss_index_path: &str) -> anyhow::Result<VectorStore> {
let config = ConversionConfig {
source_path: faiss_index_path.to_string(),
target_index_type: oxirs_vec::index::IndexType::HNSW,
preserve_ids: true,
};
let converter = FaissConverter::new(config)?;
let store = converter.convert()?;
Ok(store)
}
```
### From Annoy
```rust
fn migrate_from_annoy(
annoy_vectors: Vec<(String, Vec<f32>)>
) -> anyhow::Result<VectorStore> {
let mut store = VectorStore::new();
for (id, vector) in annoy_vectors {
store.index_vector(id, Vector::new(vector))?;
}
Ok(store)
}
```
---
## Performance Benchmarks
### Single-Node Performance
| 1M vectors | 2 min | 5 ms | 6 GB | 0.98 |
| 10M vectors | 20 min | 8 ms | 60 GB | 0.96 |
| 100M vectors | 3 hr | 12 ms | 600 GB | 0.95 |
### GPU Acceleration (CUDA)
| Cosine Distance | 100k q/s | 5M q/s | 50× |
| Euclidean Distance | 120k q/s | 4M q/s | 33× |
| Dot Product | 150k q/s | 8M q/s | 53× |
---
## Next Steps
1. **Production Checklist**: Review the production readiness checklist
2. **Performance Tuning**: See the performance tuning guide
3. **Best Practices**: Read the best practices guide
4. **WAL Configuration**: Configure crash recovery with WAL
5. **GPU Setup**: Set up GPU acceleration for maximum performance
---
## Support & Resources
- **Documentation**: https://docs.rs/oxirs-vec
- **Repository**: https://github.com/cool-japan/oxirs
- **Issues**: https://github.com/cool-japan/oxirs/issues
- **Discussions**: https://github.com/cool-japan/oxirs/discussions
---
**Document Version**: 1.0
**OxiRS Vec Version**: v0.2.3
**Last Updated**: 2026-03-05