manifold-vectors
Vector storage optimizations for the Manifold embedded database.
Overview
manifold-vectors provides ergonomic, type-safe wrappers around Manifold's core primitives for storing and retrieving vector embeddings commonly used in ML/AI applications. It does not implement vector indexing algorithms - instead, it focuses on efficient persistent storage and provides integration traits for external libraries like instant-distance.
Features
- Zero-copy access - Fixed-dimension vectors leverage Manifold's
fixed_width()trait for direct memory-mapped access without deserialization overhead - Type safety - Compile-time dimension checking via const generics
- Multiple formats - Dense, sparse (COO), and multi-vector (ColBERT-style) support
- High performance - Batch operations, efficient encoding, WAL group commit
- Integration ready -
VectorSourcetrait for external index libraries (HNSW, FAISS, etc.)
Quick Start
Dense Vectors
use ColumnFamilyDatabase;
use ;
// Open database and column family
let db = open?;
let cf = db.column_family_or_create?;
// Write vectors
// Read with zero-copy access - no allocations!
let read_txn = cf.begin_read?;
let vectors = open?;
if let Some = vectors.get?
Batch Operations
For high-throughput vector loading, use batch operations which leverage Manifold's WAL group commit:
let items = vec!;
let write_txn = cf.begin_write?;
let mut vectors = open?;
// Insert all vectors in one batch
vectors.insert_batch?;
drop;
write_txn.commit?;
Sparse Vectors
For high-dimensional sparse vectors (e.g., TF-IDF, one-hot encodings):
use ;
// Create sparse vector (COO format: coordinate list)
let sparse = new;
// Write
// Read
let read_txn = cf.begin_read?;
let sparse_table = open?;
let retrieved = sparse_table.get?.unwrap;
// Compute sparse dot product
let other = new;
let dot = retrieved.dot;
println!;
Multi-Vectors (ColBERT-style)
For storing multiple vectors per document (e.g., token embeddings):
use ;
// Each document has multiple token embeddings
let token_embeddings = vec!;
let read_txn = cf.begin_read?;
let multi = open?;
let tokens = multi.get?.unwrap;
println!;
Distance Functions
The crate includes common distance and similarity metrics that work directly with zero-copy VectorGuard types:
use distance;
let vec_a = ;
let vec_b = ;
let cosine_sim = cosine; // 0.0 (orthogonal)
let euclidean = euclidean; // sqrt(2)
let euclidean_sq = euclidean_squared; // 2.0 (faster)
let manhattan = manhattan; // 2.0
let dot = dot_product; // 0.0
Architecture
Zero-Copy Design
Dense vectors use Manifold's Value trait with fixed-width serialization:
This enables true zero-copy reads - vectors are read directly from memory-mapped pages without deserialization.
Performance Characteristics
- Write (dense): O(log n) B-tree insert, benefits from WAL group commit
- Read (dense): O(log n) lookup + zero-copy mmap access (no allocation)
- Write (sparse): O(log n) + O(k log k) sorting where k = non-zero count
- Read (sparse): O(log n) + allocation for coordinate list
- Storage (dense): DIM × 4 bytes per vector (fixed width)
- Storage (sparse): k × 8 bytes per vector (4 bytes index + 4 bytes value)
Integration with Vector Index Libraries
The VectorSource trait enables integration with external vector index libraries:
use VectorSource;
use ;
let read_txn = cf.begin_read?;
let vectors = open?;
// Build HNSW index from stored vectors
let mut points = Vecnew;
let mut ids = Vecnew;
for result in vectors.all_vectors?
let hnsw = default.build;
// Search for nearest neighbors
let query = new;
let search = default;
let results = hnsw.search;
for item in results
Examples
The crate includes comprehensive examples demonstrating real-world usage:
1. Dense Semantic Search (examples/dense_semantic_search.rs)
Full RAG pipeline with:
- Document embeddings using tessera-embeddings
- Cosine similarity search
- Result ranking
- Zero-copy performance
2. Sparse Hybrid Search (examples/sparse_hybrid_search.rs)
Combines dense and sparse vectors for hybrid search:
- Dense semantic embeddings
- Sparse TF-IDF vectors
- Weighted fusion of results
- BM25-style ranking
3. Multi-Vector ColBERT (examples/multi_vector_colbert.rs)
Token-level embeddings for fine-grained matching:
- Multi-vector storage per document
- MaxSim scoring (ColBERT-style)
- Late interaction models
4. RAG Complete (examples/rag_complete.rs)
Production RAG implementation:
- Document chunking and embedding
- Similarity search
- Context retrieval
- Integration patterns
Use Cases
- Semantic search - Document and query embeddings for retrieval
- Recommendation systems - User and item embeddings
- RAG (Retrieval Augmented Generation) - Document chunk embeddings
- Image similarity - Vision model embeddings
- Anomaly detection - Embedding-based outlier detection
- Clustering - High-dimensional data points
Combining with Other Domain Layers
manifold-vectors works seamlessly with other manifold domain layers in the same database:
use ColumnFamilyDatabase;
use VectorTable;
use GraphTable;
use TimeSeriesTable;
let db = open?;
// Different column families for different access patterns
let vectors_cf = db.column_family_or_create?;
let graph_cf = db.column_family_or_create?;
let metrics_cf = db.column_family_or_create?;
// Store user embeddings
let txn = vectors_cf.begin_write?;
let mut vectors = open?;
vectors.insert?;
// Store user relationships
let txn = graph_cf.begin_write?;
let mut graph = open?;
graph.add_edge?;
// Store user activity metrics
let txn = metrics_cf.begin_write?;
let mut ts = open?;
ts.write?;
Requirements
- Rust 1.70+ (for const generics)
manifoldversion 3.1+
Performance Tips
- Use batch operations for bulk loading - 2-3x faster than individual inserts
- Pre-sort data when possible and set
sorted: true- saves sorting overhead - Use zero-copy guards for read operations - avoid calling
.value().to_vec()unless needed - Choose the right format:
- Dense: Most data is non-zero, < 10,000 dimensions
- Sparse: > 90% zeros, or very high dimensional
- Multi-vector: Token-level or chunk-level embeddings
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
Contributing
Contributions are welcome! This crate follows the patterns established in the manifold domain layer architecture.
Related Crates
- manifold - Core embedded database
- manifold-graph - Graph storage for relationships
- manifold-timeseries - Time series storage for metrics
- instant-distance - HNSW vector index