Expand description
Vector Engine - k-NN similarity search with HNSW support
This crate provides embeddings storage and similarity search functionality for the Neumann database system.
§Features
- Dense and sparse vectors: Automatic sparse detection for memory efficiency
- Multiple distance metrics: Cosine, Euclidean, dot product, and extended metrics
- HNSW indexing: Hierarchical Navigable Small World graphs for fast k-NN search
- Batch operations: Parallel processing for large embedding batches
- Entity embeddings: Associate embeddings with existing entities in
TensorStore - Memory bounds: Configurable limits for dimension and scan operations
§Quick Start
ⓘ
use vector_engine::{VectorEngine, VectorEngineConfig};
let engine = VectorEngine::new();
engine.store_embedding("doc1", vec![0.1, 0.2, 0.3]).unwrap();
engine.store_embedding("doc2", vec![0.2, 0.3, 0.4]).unwrap();
let results = engine.search_similar(&[0.15, 0.25, 0.35], 10).unwrap();§Configuration
Use VectorEngineConfig to tune behavior:
max_dimension: Limit embedding dimensions for memory safetymax_keys_per_scan: Bound memory usage on unbounded operationssparse_threshold: Control sparse vector detection (0.0-1.0)parallel_threshold: Control when to use parallel processing
§Distance Metrics
Two metric enums are available:
§DistanceMetric (Simple API)
Use for basic similarity search via search_similar_with_metric():
Cosine- Best for normalized embeddings (text, images). Range: [-1, 1]Euclidean- Best for absolute distances. Range: [0, inf)DotProduct- Best for recommendation systems. Range: (-inf, inf)
§ExtendedDistanceMetric (HNSW API)
Use for HNSW index construction via search_with_hnsw_and_metric():
- All
DistanceMetricvariants plus: Angular- Cosine converted to angular distanceManhattan- L1 norm, robust to outliersChebyshev- L-infinity norm, max absolute differenceJaccard- Set similarity for binary/sparse vectorsOverlap- Minimum overlap coefficientGeodesic- Spherical distance for geographic dataComposite- Weighted combination of metrics
§When to Use Which
| Use Case | Recommended Metric |
|---|---|
| Text embeddings (OpenAI, etc.) | Cosine |
| Image feature vectors | Cosine or Euclidean |
| Sparse vectors (TF-IDF) | Jaccard or Cosine |
| Geographic coordinates | Geodesic |
| Recommendation scores | DotProduct |
| General purpose | Cosine (default) |
Structs§
- Batch
Result - Result of a batch operation.
- Binary
Vector - Binary quantized vector packed into u64 words.
- Embedding
Input - Input for batch embedding storage.
- Filtered
Search Config - Configuration for filtered search behavior.
- Geometric
Config - Configuration for composite geometric scoring.
- HNSW
Build Options - Options for building an HNSW index.
- HNSW
Config - Configuration for HNSW index.
- HNSW
Index - HNSW index for approximate nearest neighbor search.
- IVFBuild
Options - Options for building an IVF (Inverted File) index.
- IVFConfig
- IVF configuration.
- IVFIndex
- IVF Index with inverted lists.
- IVFIndex
State - Serializable state of an IVF index (excludes vectors).
- PQCodebook
- Trained PQ codebook storing M * K centroids.
- PQConfig
- Configuration for Product Quantization.
- PQVector
- PQ-encoded vector (M bytes for 8-bit codes).
- Paged
Result - Result of a paginated query.
- Pagination
- Pagination parameters for list and search operations.
- Persistent
Vector Index - Persistent vector index format for saving to disk.
- Scalar
Quantized Vector - 8-bit scalar quantized vector with ~4x memory reduction.
- Search
Result - Result of a similarity search.
- Vector
Collection Config - Configuration for a vector collection.
- Vector
Engine - Vector Engine for storing and searching embeddings.
- Vector
Engine Config - Configuration for the Vector Engine.
- Vector
Entry - A single vector entry in the persistent index.
- WalConfig
- WAL configuration.
Enums§
- Binary
Threshold - Binarization threshold method.
- Distance
Metric - Distance metric for similarity search.
- Extended
Distance Metric - Distance metric for vector similarity/distance computation.
- Filter
Condition - Filter condition for filtered similarity search.
- Filter
Strategy - Strategy for filtered search execution.
- Filter
Value - Filter value for comparisons.
- HNSW
Storage Strategy - Storage strategy for HNSW index construction.
- IVFStorage
- Storage format for vectors within IVF lists.
- Metadata
Value - Metadata value for serialization (simplified from
TensorValue). - Vector
Error - Error types for vector operations.
Type Aliases§
- Result
- A specialized Result type for vector operations.