Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
oxify-connect-vector
Vector database connections and abstractions for the OxiFY workflow engine.
Overview
Status: Production-ready ✅
Provides a unified interface for vector databases with advanced features like hybrid search, caching, reranking, and filtering.
Features
Core Capabilities
- 6 Vector Database Providers: Qdrant, pgvector, ChromaDB, Pinecone, Weaviate, Milvus
- Batch Operations: Efficient batch insert with native APIs for all providers
- Parallel Batch Operations: High-throughput parallel insert and search operations
- Update Operations: Update vectors and/or payloads in-place
- Collection Management: Create, check existence, and get statistics
- Unified Filtering: Expression-based filter language that works across all providers
Advanced Search
- Hybrid Search: Combine semantic vector search with BM25 keyword search using Reciprocal Rank Fusion
- ColBERT-style Multi-Vector Search: Store multiple vectors per document with MaxSim scoring
- Reranking: Multiple strategies (Cohere API, MMR, keyword boost, custom scorers)
- Advanced Caching: LRU caching for embeddings and search results with TTL
Data Management
- Migration Tools: Export/import collections between providers with verification
- Batch Processing: Optimized batch operations for all providers
- Data Portability: VectorSnapshot for backup/restore with JSON serialization
Observability & Quality
- Metrics Collection: Comprehensive operation tracking (latency, errors, throughput)
- Health Monitoring: Provider health checks with status tracking
- CI/CD Integration: Automated testing and performance regression detection
- Mock Provider: In-memory provider for testing without a database
- Zero Warnings: Production-ready code with 120 comprehensive tests (91 unit + 25 doc + 4 integration)
Quick Start
Basic Vector Search
use ;
use json;
async
Hybrid Search
Combine semantic vector search with BM25 keyword search for better accuracy:
use ;
// Create BM25 index
let mut bm25 = new;
bm25.add_document;
bm25.add_document;
// Create hybrid search engine
let engine = new;
// Search with both vector and text query
let results = engine.search.await?;
Caching for Performance
use ;
use Duration;
// Cache embeddings to avoid redundant API calls
let embedding_cache = new;
// Check cache before generating
if let Some = embedding_cache.get else
// Cache search results
let search_cache = new;
// Get cache statistics
let stats = embedding_cache.stats;
println!;
println!;
Advanced Filtering
use ;
// Build complex filter expressions
let filter = And;
// Works across all providers (automatically converted)
let results = provider.search.await?;
Reranking
use ;
// MMR (Maximal Marginal Relevance) for diversity
let mmr = new; // Lambda = 0.7 (balance relevance vs diversity)
let reranked = mmr.rerank.await?;
// Boost results matching keywords
let keyword_boost = new;
let boosted = keyword_boost.rerank.await?;
// Chain multiple rerankers
let chain = new;
let final_results = chain.rerank.await?;
Parallel Batch Operations
For high-throughput workloads, use parallel batch operations:
use ;
use Arc;
// Wrap provider in Arc for sharing across tasks
let provider = new;
// Prepare bulk insert requests
let mut requests = Vecnew;
for i in 0..10000
// Configure parallelism
let config = ParallelConfig ;
// Insert in parallel (significantly faster than sequential)
let inserted = parallel_batch_insert.await?;
println!;
Batch Operations
Efficiently insert multiple vectors at once:
use BatchInsertRequest;
let vectors = vec!;
let count = provider.batch_insert.await?;
println!;
Update Operations
Update existing vectors and/or their metadata:
use UpdateRequest;
// Update vector only
provider.update.await?;
// Update payload only
provider.update.await?;
Collection Statistics
Get information about your collections:
let info = provider.collection_info.await?;
println!;
println!;
println!;
ColBERT-style Multi-Vector Search
Store multiple vectors per document for token-level embeddings:
use ;
let colbert = new;
// Insert document with multiple vectors (e.g., token embeddings)
colbert.insert_multi_vector.await?;
// Search with multiple query vectors
let results = colbert.search_multi_vector.await?;
Data Migration
Migrate data between different vector database providers:
use *;
// Export collection to snapshot
let snapshot = export_collection.await?;
// Save to file
snapshot.save_to_file?;
// Import to different provider
let snapshot = load_from_file?;
import_snapshot.await?;
// Or migrate directly
migrate_collection.await?;
Metrics and Monitoring
Track performance and health of vector operations:
use ;
// Wrap provider with metrics collection
let metrics_provider = new;
// Perform operations...
metrics_provider.search.await?;
metrics_provider.insert.await?;
// Get metrics
let search_stats = metrics_provider.metrics.search_stats;
println!;
// Health monitoring
let health_provider = new;
let health = health_provider.check_health.await?;
println!;
Utility Functions
Use built-in vector math utilities:
use ;
let vec1 = vec!;
let vec2 = vec!;
// Compute similarity and distances
let similarity = cosine_similarity;
let euclidean = euclidean_distance;
let manhattan = manhattan_distance;
// Validate and normalize
assert!; // Check for NaN/Inf
let normalized = normalize_vector;
// Batch operations
let vectors = vec!;
let normalized_batch = batch_normalize;
Mock Provider for Testing
use ;
use json;
async
async
Supported Vector Databases
| Provider | Status | Features | Best For |
|---|---|---|---|
| Qdrant | ✅ | Full CRUD, filters, gRPC | High performance production |
| pgvector | ✅ | PostgreSQL extension, SQL | Existing PostgreSQL setups |
| ChromaDB | ✅ | Simple HTTP API, metadata | Quick prototyping |
| Pinecone | ✅ | Managed service, namespaces | Serverless deployments |
| Weaviate | ✅ | GraphQL, schema | Rich queries, multi-tenancy |
| Milvus | ✅ | Distributed, scalable | Large-scale deployments |
Architecture
Benchmarks & Performance Testing
Run benchmarks to measure performance:
# Run all benchmarks
# Run specific benchmark suite
# Performance regression testing (compare with main branch)
Available benchmarks:
- Search latency: 100, 1k, 10k vector collections
- Throughput: Queries/second and inserts/second
- Accuracy: Recall@k for k=1,5,10,20
- Dimension scaling: 64-1024 dimensions
- Hybrid search: BM25, fusion weights, RRF parameters
- Cache performance: Hit/miss, eviction, different sizes
- ColBERT: Multi-vector search, MaxSim computation
CI/CD Integration
Automated testing runs on every push:
- Unit tests (60+ tests)
- Integration tests with Docker services (Qdrant, PostgreSQL, ChromaDB)
- Clippy linting (zero warnings enforced)
- rustfmt checks
- Performance regression detection on PRs
See .github/workflows/ for workflow definitions.
Testing
# Run unit tests (no database required)
# Run integration tests (requires running databases)
# Run tests with output
# Run clippy (zero warnings enforced)
All tests pass with zero warnings enforced. Total: 68 tests (60 unit + 8 doc tests).
Integration Testing
Docker Compose configuration included for integration tests:
# Start all vector databases
# Run integration tests
# Stop databases
Supported databases in integration tests:
- Qdrant (ports 6333, 6334)
- PostgreSQL with pgvector (port 5432)
- ChromaDB (port 8000)
- Milvus (ports 19530, 9091)
Error Handling
pub type Result<T> = Result;
Integration with oxify-connect-llm
Enable embeddings feature for automatic embedding generation:
[]
= { = "0.1", = ["embeddings"] }
use EmbeddingVectorStore;
use OpenAIEmbedding;
let embedding_provider = new;
let vector_store = new;
// Insert text (automatically generates embeddings)
vector_store.insert_text.await?;
// Search by text (automatically generates query embedding)
let results = vector_store.search_by_text.await?;
Performance Tips
- Use caching: Cache embeddings and frequently accessed search results
- Hybrid search: Combine vector and keyword search for better accuracy
- Batch operations: Insert multiple vectors at once when possible (use native batch APIs)
- Score threshold: Filter low-quality results early
- Reranking: Use MMR for diversity, keyword boost for precision
- Metrics: Monitor performance with MetricsProvider to identify bottlenecks
- Health checks: Use HealthCheckProvider to detect provider issues early
Documentation
Comprehensive documentation available:
-
Performance Testing Guide:
docs/PERFORMANCE_TESTING.md- How to run benchmarks and interpret results
- Performance regression testing with
perf_regression.sh - Profiling and optimization techniques
- CI/CD integration for continuous monitoring
-
Provider Comparison Guide:
docs/PROVIDER_COMPARISON.md- Detailed comparison of all 6 providers
- Cost analysis for cloud providers
- Feature comparison matrix
- Decision tree to help choose the right provider
- Migration strategies
-
Integration Testing:
tests/INTEGRATION_TESTING.md- Docker Compose setup
- Running integration tests
- CI/CD integration
Project Status
✅ Production-ready - All phases complete:
- ✅ Phase 1-10: Core features implemented
- ✅ CI/CD: Automated testing and performance regression
- ✅ Documentation: Comprehensive guides and examples
- ✅ Quality: Zero warnings, 68 tests, all passing
See Also
oxify-model: Data model definitionsoxify-connect-llm: Embedding and LLM providers (OpenAI, Cohere, Ollama)oxify-engine: Workflow execution engineoxify-storage: Database abstractions
Contributing
This crate follows strict quality standards:
- Zero warnings policy (enforced by CI)
- Comprehensive tests (unit + integration + doc tests)
- Performance regression testing
- Full API documentation
License
Apache-2.0