Crate memvdb

Source
Expand description

§MemVDB - An In-Memory Vector Database

MemVDB is a fast, lightweight in-memory vector database written in Rust. It supports multiple distance metrics and provides efficient similarity search for machine learning applications, recommendation systems, and semantic search.

§Features

  • Multiple Distance Metrics: Euclidean, Cosine, and Dot Product
  • High Performance: Optimized similarity search with binary heap algorithms
  • Flexible Metadata: Store arbitrary metadata with each embedding
  • Batch Operations: Efficient batch insertion and updates
  • Thread Safety: Safe concurrent access with proper locking
  • Zero Dependencies: Minimal external dependencies for core functionality

§Quick Start

use memvdb::{CacheDB, Distance, Embedding};
use std::collections::HashMap;

// Create a new in-memory vector database
let mut db = CacheDB::new();

// Create a collection with 128-dimensional vectors using cosine similarity
db.create_collection("documents".to_string(), 128, Distance::Cosine).unwrap();

// Create an embedding with metadata
let mut id = HashMap::new();
id.insert("doc_id".to_string(), "doc_001".to_string());

let mut metadata = HashMap::new();
metadata.insert("title".to_string(), "Sample Document".to_string());
metadata.insert("category".to_string(), "AI".to_string());

let vector = vec![0.1; 128]; // 128-dimensional vector
let embedding = Embedding {
    id,
    vector,
    metadata: Some(metadata),
};

// Insert the embedding
db.insert_into_collection("documents", embedding).unwrap();

// Perform similarity search
let query_vector = vec![0.2; 128];
let collection = db.get_collection("documents").unwrap();
let results = collection.get_similarity(&query_vector, 5);

println!("Found {} similar documents", results.len());

§Distance Metrics

MemVDB supports three distance metrics optimized for different use cases:

  • Euclidean Distance: Best for spatial data and when absolute distances matter
  • Cosine Similarity: Ideal for text embeddings and high-dimensional sparse data
  • Dot Product: Efficient for normalized vectors and neural network outputs

§Architecture

The library is organized into three main modules:

  • db: Core database functionality, collections, and embeddings management
  • similarity: Distance calculation and vector operations
  • Public API exports for easy integration

§Performance Characteristics

  • Insertion: O(1) average case for single embeddings
  • Similarity Search: O(n) where n is the number of embeddings in the collection
  • Memory Usage: Linear with number of embeddings and vector dimensions
  • Concurrency: Thread-safe operations with mutex-based protection

Structs§

BatchInsertEmbeddingsStruct
Configuration for batch embedding operations.
CacheDB
The main in-memory vector database structure.
Collection
A collection of embeddings with a specific dimensionality and distance metric.
CollectionHandlerStruct
Configuration for collection operations.
CreateCollectionStruct
Configuration for creating a new collection.
Embedding
An individual embedding (vector) with associated metadata.
GetSimilarityStruct
Configuration for similarity search operations.
InsertEmbeddingStruct
Configuration for inserting a single embedding.
ScoreIndex
A helper structure for k-nearest neighbor search operations.
SimilarityResult
Result of a similarity search operation.

Enums§

Distance
Supported distance metrics for similarity calculations.
Error
Error types for database operations.

Functions§

add
Simple addition function for testing purposes
create_database
Convenience function for quick database setup
get_cache_attr
Pre-computes cacheable attributes for distance calculations.
get_distance_fn
Returns the appropriate distance function for the specified metric.
normalize
Normalizes a vector to unit length (L2 normalization).