VectorLite

A high-performance, in-memory vector database optimized for AI agent workloads with HTTP API and thread-safe concurrency.

Overview

VectorLite is designed for single-instance, low-latency vector operations in AI agent environments. It prioritizes sub-millisecond search performance over distributed scalability, making it ideal for:

AI Agent Sessions: Session-scoped vector storage with fast retrieval
Real-time Search: Sub-millisecond response requirements for pre-computed embeddings
Prototype Development: Rapid iteration without infrastructure complexity
Single-tenant Applications: No multi-tenancy isolation requirements

Key Features

In-memory storage for zero-latency access patterns
Native Rust ML models using Candle framework with pluggable architecture. Bring your own embedding model (default to all-MiniLM-L6-v2)
Thread-safe concurrency with RwLock per collection and atomic ID generation
HNSW indexing for approximate nearest neighbor search with configurable accuracy
Collection persistence with vector lite collection (VLC) file format for saving/loading collections

HTTP API

Presentation of the RESTful interface.

# Health check
GET /health

# Collection management
GET /collections
POST /collections {"name": "docs", "index_type": "hnsw"}
DELETE /collections/{name}

# Vector operations
POST /collections/{name}/text {"text": "Hello world"}
POST /collections/{name}/vector {"id": 1, "values": [0.1, 0.2, ...]}
POST /collections/{name}/search/text {"query": "hello", "k": 10}
POST /collections/{name}/search/vector {"query": [0.1, 0.2, ...], "k": 10}
GET /collections/{name}/vectors/{id}
DELETE /collections/{name}/vectors/{id}

# Persistence operations
POST /collections/{name}/save {"file_path": "./collection.vlc"}
POST /collections/load {"file_path": "./collection.vlc", "collection_name": "restored"}

Index Types

Flat

Complexity: O(n) search, O(1) insert
Memory: Linear with dataset size
Use Case: Small datasets (< 10K vectors) or exact search requirements

HNSW

Complexity: O(log n) search, O(log n) insert
Memory: ~2-3x vector size due to graph structure
Use Case: Large datasets with approximate search tolerance

See Hierarchical Navigable Small World paper for details.

ML Model Integration

Built-in Embedding Models

all-MiniLM-L6-v2: Default 384-dimensional model for general-purpose text
Candle Framework: Native Rust ML inference with CPU/GPU acceleration
Pluggable Architecture: Easy integration of custom embedding models
Memory Efficient: Models loaded once and shared across requests

Similarity Metrics

Cosine: Default for normalized embeddings, scale-invariant
Euclidean: Geometric distance, sensitive to vector magnitude
Manhattan: L1 norm, robust to outliers
Dot Product: Raw similarity, requires consistent vector scaling

Configuration Profiles

# Balanced (default)
cargo build

# Memory-constrained environments
cargo build --features memory-optimized

# High-precision search
cargo build --features high-accuracy

Getting Started

use vectorlite::{VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric};

// Create client with embedding function
let client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));

// Create collection
client.create_collection("documents", IndexType::HNSW)?;

// Add text (auto-generates embedding and ID)
let id = client.add_text_to_collection("documents", "Hello world")?;

// Search
let results = client.search_text_in_collection(
    "documents", 
    "hello", 
    5, 
    SimilarityMetric::Cosine
)?;

HTTP Server Example

use vectorlite::{VectorLiteClient, EmbeddingGenerator, start_server};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));
    start_server(client, "127.0.0.1", 3000).await?;
    Ok(())
}

CLI Usage

Start the server with optional collection loading:

# Start empty server
cargo run --bin vectorlite -- --port 3002

# Start with pre-loaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3002

Testing

Run tests with mock embeddings (CI-friendly, no model files required):

cargo test --features mock-embeddings

Run tests with real ML models (requires downloaded models):

cargo test

Download ML Model

This downloads the BERT-based embedding model files needed for real embedding generation:

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir models/all-MiniLM-L6-v2

License

Apache 2.0 License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

vectorlite 0.1.4