Skip to main content

Crate vector_engine

Crate vector_engine 

Source
Expand description

Vector Engine - k-NN similarity search with HNSW support

This crate provides embeddings storage and similarity search functionality for the Neumann database system.

§Features

  • Dense and sparse vectors: Automatic sparse detection for memory efficiency
  • Multiple distance metrics: Cosine, Euclidean, dot product, and extended metrics
  • HNSW indexing: Hierarchical Navigable Small World graphs for fast k-NN search
  • Batch operations: Parallel processing for large embedding batches
  • Entity embeddings: Associate embeddings with existing entities in TensorStore
  • Memory bounds: Configurable limits for dimension and scan operations

§Quick Start

use vector_engine::{VectorEngine, VectorEngineConfig};

let engine = VectorEngine::new();
engine.store_embedding("doc1", vec![0.1, 0.2, 0.3]).unwrap();
engine.store_embedding("doc2", vec![0.2, 0.3, 0.4]).unwrap();

let results = engine.search_similar(&[0.15, 0.25, 0.35], 10).unwrap();

§Configuration

Use VectorEngineConfig to tune behavior:

  • max_dimension: Limit embedding dimensions for memory safety
  • max_keys_per_scan: Bound memory usage on unbounded operations
  • sparse_threshold: Control sparse vector detection (0.0-1.0)
  • parallel_threshold: Control when to use parallel processing

§Distance Metrics

Two metric enums are available:

§DistanceMetric (Simple API)

Use for basic similarity search via search_similar_with_metric():

  • Cosine - Best for normalized embeddings (text, images). Range: [-1, 1]
  • Euclidean - Best for absolute distances. Range: [0, inf)
  • DotProduct - Best for recommendation systems. Range: (-inf, inf)

§ExtendedDistanceMetric (HNSW API)

Use for HNSW index construction via search_with_hnsw_and_metric():

  • All DistanceMetric variants plus:
  • Angular - Cosine converted to angular distance
  • Manhattan - L1 norm, robust to outliers
  • Chebyshev - L-infinity norm, max absolute difference
  • Jaccard - Set similarity for binary/sparse vectors
  • Overlap - Minimum overlap coefficient
  • Geodesic - Spherical distance for geographic data
  • Composite - Weighted combination of metrics

§When to Use Which

Use CaseRecommended Metric
Text embeddings (OpenAI, etc.)Cosine
Image feature vectorsCosine or Euclidean
Sparse vectors (TF-IDF)Jaccard or Cosine
Geographic coordinatesGeodesic
Recommendation scoresDotProduct
General purposeCosine (default)

Structs§

BatchResult
Result of a batch operation.
BinaryVector
Binary quantized vector packed into u64 words.
EmbeddingInput
Input for batch embedding storage.
FilteredSearchConfig
Configuration for filtered search behavior.
GeometricConfig
Configuration for composite geometric scoring.
HNSWBuildOptions
Options for building an HNSW index.
HNSWConfig
Configuration for HNSW index.
HNSWIndex
HNSW index for approximate nearest neighbor search.
IVFBuildOptions
Options for building an IVF (Inverted File) index.
IVFConfig
IVF configuration.
IVFIndex
IVF Index with inverted lists.
IVFIndexState
Serializable state of an IVF index (excludes vectors).
PQCodebook
Trained PQ codebook storing M * K centroids.
PQConfig
Configuration for Product Quantization.
PQVector
PQ-encoded vector (M bytes for 8-bit codes).
PagedResult
Result of a paginated query.
Pagination
Pagination parameters for list and search operations.
PersistentVectorIndex
Persistent vector index format for saving to disk.
ScalarQuantizedVector
8-bit scalar quantized vector with ~4x memory reduction.
SearchResult
Result of a similarity search.
VectorCollectionConfig
Configuration for a vector collection.
VectorEngine
Vector Engine for storing and searching embeddings.
VectorEngineConfig
Configuration for the Vector Engine.
VectorEntry
A single vector entry in the persistent index.
WalConfig
WAL configuration.

Enums§

BinaryThreshold
Binarization threshold method.
DistanceMetric
Distance metric for similarity search.
ExtendedDistanceMetric
Distance metric for vector similarity/distance computation.
FilterCondition
Filter condition for filtered similarity search.
FilterStrategy
Strategy for filtered search execution.
FilterValue
Filter value for comparisons.
HNSWStorageStrategy
Storage strategy for HNSW index construction.
IVFStorage
Storage format for vectors within IVF lists.
MetadataValue
Metadata value for serialization (simplified from TensorValue).
VectorError
Error types for vector operations.

Type Aliases§

Result
A specialized Result type for vector operations.