VectorLite

A tiny, in-process Rust vector store with built-in embeddings for sub-millisecond semantic search.

VectorLite is a high-performance, in-memory vector database optimized for AI agent and edge workloads.
It co-locates model inference (via Candle) with a low-latency vector index, making it ideal for session-scoped, single-instance, or privacy-sensitive environments.

Why VectorLite?

Feature	Description
Sub-millisecond search	In-memory HNSW or flat search tuned for real-time agent loops.
Built-in embeddings	Runs all-MiniLM-L6-v2 locally using Candle, or any other model of your choice. No external API calls.
Single-binary simplicity	No dependencies, no servers to orchestrate. Start instantly via CLI or Docker.
Session-scoped collections	Perfect for ephemeral agent sessions or sidecars
Thread-safe concurrency	RwLock-based access and atomic ID generation for multi-threaded workloads.
Instant persistence	Save or restore collections snapshots in one call.

VectorLite trades distributed scalability for deterministic performance, perfect for use cases where latency mattters more than millions of vectors.

When to Use It

Scenario	Why VectorLite fits
AI agent sessions	Keep short-lived embeddings per conversation. No network latency.
Edge or embedded AI	Run fully offline with model + index in one binary.
Realtime search / personalization	Sub-ms search for pre-computed embeddings.
Local prototyping & CI	Rust-native, no external services.
Single-tenant microservices	Lightweight sidecar for semantic capabilities.

Quick Start

Run from Source

cargo run --bin vectorlite -- --port 3001

# Start with preloaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3001

Run with Docker

With default settings:

docker build -t vectorlite .
docker run -p 3001:3001 vectorlite

With a different embeddings model and memory-optimized HNSW:

docker build \
  --build-arg MODEL_NAME="sentence-transformers/paraphrase-MiniLM-L3-v2" \
  --build-arg FEATURES="memory-optimized" \
  -t vectorlite-small .

HTTP API Overview

Operation	Method & Endpoint	Body
Health	`GET /health`	–
List collections	`GET /collections`	–
Create collection	`POST /collections`	`{"name": "docs", "index_type": "hnsw"}`
Delete collection	`DELETE /collections/{name}`	–
Add text	`POST /collections/{name}/text`	`{"text": "Hello world", "metadata": {...}}`
Search (text)	`POST /collections/{name}/search/text`	`{"query": "hello", "k": 5}`
Get vector	`GET /collections/{name}/vectors/{id}`	–
Delete vector	`DELETE /collections/{name}/vectors/{id}`	–
Save collection	`POST /collections/{name}/save`	`{"file_path": "./collection.vlc"}`
Load collection	`POST /collections/load`	`{"file_path": "./collection.vlc", "collection_name": "restored"}`

Index Types

Index	Search Complexity	Insert	Use Case
Flat	O(n)	O(1)	Small datasets (<10K) or exact search
HNSW	O(log n)	O(log n)	Larger datasets or approximate search

See Hierarchical Navigable Small World.

Configuration profiles for HNSW

Profile	Features	Use Case
default	balanced	general workloads
memory-optimized	reduced precision, smaller graph	constrained devices
high-accuracy	higher recall, more memory	offline re-ranking or research

cargo build --features memory-optimized

Similarity Metrics

Cosine: Default for normalized embeddings, scale-invariant
Euclidean: Geometric distance, sensitive to vector magnitude
Manhattan: L1 norm, robust to outliers
Dot Product: Raw similarity, requires consistent vector scaling

Rust SDK Example

use vectorlite::{VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric};
use serde_json::json;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));

    client.create_collection("quotes", IndexType::HNSW)?;
    
    let id = client.add_text_to_collection(
        "quotes", 
        "I just want to lie on the beach and eat hot dogs",
        Some(json!({
            "author": "Kevin Malone",
            "tags": ["the-office", "s3:e23"],
            "year": 2005,
        }))
    )?;

    let results = client.search_text_in_collection(
        "quotes",
        "beach games",
        3,
        SimilarityMetric::Cosine,
    )?;

    for result in &results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}

Testing

Run tests with mock embeddings (CI-friendly, no model files required):

cargo test --features mock-embeddings

Run tests with local models:

cargo test

Download ML Model

This downloads the BERT-based embedding model files needed for real embedding generation:

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir models/all-MiniLM-L6-v2

The model files must be present in the ./models/{model-name}/ directory with the required files:

config.json
pytorch_model.bin
tokenizer.json

Using a different model

You can override the default embedding model at compile time using the custom-model feature:

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo build --features custom-model

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo run --features custom-model

License

Apache 2.0 License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

vectorlite 0.1.5