vector_xlite 1.2.0

VectorXLite: A fast and lightweight SQLite extension for vector search with payload support.
Documentation

Overview

VectorXLite is a high-performance, embeddable vector database built on SQLite. It combines the power of HNSW-based approximate nearest neighbor search with the flexibility of SQL for metadata filtering, making it ideal for AI/ML applications, semantic search, and recommendation systems.

Why VectorXLite?

Feature Benefit
Embedded Architecture No separate server required - runs in-process
SQLite Foundation Battle-tested storage with ACID guarantees
HNSW Index Sub-millisecond similarity search on millions of vectors
SQL Filtering Full SQL support for complex payload queries
Atomic Operations Transaction support for data consistency
Zero Configuration Works out of the box with sensible defaults

Features

  • Multiple Distance Functions: Cosine similarity, L2 (Euclidean), and Inner Product
  • Flexible Dimensions: Support for vectors of any dimension
  • Rich Payload Support: Store and query arbitrary metadata alongside vectors
  • Hybrid Search: Combine vector similarity with SQL WHERE clauses
  • Connection Pooling: Built-in r2d2 pool support for concurrent access
  • Persistent Storage: File-backed or in-memory operation modes
  • Type-Safe API: Builder pattern with compile-time validation

Installation

Add VectorXLite to your Cargo.toml:

[dependencies]
vector_xlite = "0.1"
r2d2 = "0.8"
r2d2_sqlite = "0.24"

Quick Start

use vector_xlite::{VectorXLite, customizer::SqliteConnectionCustomizer, types::*};
use r2d2::Pool;
use r2d2_sqlite::SqliteConnectionManager;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create connection pool
    let manager = SqliteConnectionManager::memory();
    let pool = Pool::builder()
        .max_size(10)
        .connection_customizer(SqliteConnectionCustomizer::new())
        .build(manager)?;

    let db = VectorXLite::new(pool)?;

    // 2. Create a collection
    let config = CollectionConfigBuilder::default()
        .collection_name("products")
        .vector_dimension(384)  // e.g., sentence-transformers output
        .distance(DistanceFunction::Cosine)
        .payload_table_schema(
            "CREATE TABLE products (
                rowid INTEGER PRIMARY KEY,
                name TEXT NOT NULL,
                category TEXT,
                price REAL
            )"
        )
        .build()?;

    db.create_collection(config)?;

    // 3. Insert vectors with metadata
    let embedding = vec![0.1, 0.2, 0.3, /* ... 384 dimensions */];

    let point = InsertPoint::builder()
        .collection_name("products")
        .id(1)
        .vector(embedding)
        .payload_insert_query(
            "INSERT INTO products(rowid, name, category, price)
             VALUES (?1, 'Wireless Headphones', 'Electronics', 99.99)"
        )
        .build()?;

    db.insert(point)?;

    // 4. Search with payload filtering
    let query_vector = vec![0.15, 0.25, 0.35, /* ... */];

    let search = SearchPoint::builder()
        .collection_name("products")
        .vector(query_vector)
        .top_k(10)
        .payload_search_query(
            "SELECT rowid, name, category, price
             FROM products
             WHERE category = 'Electronics' AND price < 150"
        )
        .build()?;

    let results = db.search(search)?;

    for result in results {
        println!("Found: {} - ${}", result["name"], result["price"]);
    }

    Ok(())
}

API Reference

VectorXLite

The main entry point for all database operations.

// Create from connection pool
let db = VectorXLite::new(pool)?;

// Available operations
db.create_collection(config)?;  // Create a new collection
db.insert(point)?;              // Insert a vector with payload
db.search(search_point)?;       // Perform similarity search

CollectionConfigBuilder

Configure a new vector collection.

Method Type Description
collection_name &str Unique identifier for the collection
vector_dimension u16 Number of dimensions (default: 3)
distance DistanceFunction Similarity metric (default: Cosine)
max_elements usize Maximum vectors (default: 100,000)
payload_table_schema &str SQL CREATE TABLE statement
index_file_path &str Path for persistent HNSW index
let config = CollectionConfigBuilder::default()
    .collection_name("embeddings")
    .vector_dimension(768)
    .distance(DistanceFunction::Cosine)
    .max_elements(1_000_000)
    .payload_table_schema("CREATE TABLE embeddings (rowid INTEGER PRIMARY KEY, data TEXT)")
    .index_file_path("/data/embeddings.idx")
    .build()?;

InsertPoint

Insert vectors with associated metadata.

Method Type Description
collection_name &str Target collection
id u64 Unique vector identifier
vector Vec<f32> The embedding vector
payload_insert_query &str SQL INSERT statement (use ?1 for rowid)
let point = InsertPoint::builder()
    .collection_name("documents")
    .id(42)
    .vector(embedding)
    .payload_insert_query("INSERT INTO documents(rowid, title) VALUES (?1, 'My Doc')")
    .build()?;

SearchPoint

Configure similarity search queries.

Method Type Description
collection_name &str Collection to search
vector Vec<f32> Query vector
top_k i32 Number of results (default: 10)
payload_search_query &str SQL SELECT for payload filtering
let search = SearchPoint::builder()
    .collection_name("documents")
    .vector(query_embedding)
    .top_k(20)
    .payload_search_query("SELECT * FROM documents WHERE status = 'active'")
    .build()?;

Distance Functions

Function Description Best For
Cosine Cosine similarity (normalized) Text embeddings, NLP
L2 Euclidean distance Image features, spatial data
IP Inner product (dot product) When vectors are pre-normalized

Storage Modes

In-Memory (Development/Testing)

let manager = SqliteConnectionManager::memory();
let pool = Pool::builder()
    .connection_customizer(SqliteConnectionCustomizer::new())
    .build(manager)?;

File-Backed (Production)

let manager = SqliteConnectionManager::file("vectors.db");
let pool = Pool::builder()
    .connection_customizer(SqliteConnectionCustomizer::new())
    .build(manager)?;

// With persistent HNSW index
let config = CollectionConfigBuilder::default()
    .collection_name("production")
    .index_file_path("/data/production.idx")
    // ... other config
    .build()?;

Advanced Usage

Complex Payload Queries with JOINs

// Create related tables
let author_table = "CREATE TABLE authors (id INTEGER PRIMARY KEY, name TEXT)";
let book_table = "CREATE TABLE books (
    rowid INTEGER PRIMARY KEY,
    author_id INTEGER,
    title TEXT,
    FOREIGN KEY (author_id) REFERENCES authors(id)
)";

// Search with JOIN
let search = SearchPoint::builder()
    .collection_name("books")
    .vector(query)
    .top_k(10)
    .payload_search_query(
        "SELECT b.rowid, b.title, a.name as author
         FROM books b
         JOIN authors a ON a.id = b.author_id
         WHERE a.name LIKE '%Smith%'"
    )
    .build()?;

JSON Payload Support

let config = CollectionConfigBuilder::default()
    .collection_name("products")
    .payload_table_schema(
        "CREATE TABLE products (
            rowid INTEGER PRIMARY KEY,
            metadata JSON
        )"
    )
    .build()?;

// Insert with JSON
let point = InsertPoint::builder()
    .collection_name("products")
    .id(1)
    .vector(embedding)
    .payload_insert_query(
        r#"INSERT INTO products(rowid, metadata)
           VALUES (?1, '{"tags": ["sale", "new"], "stock": 100}')"#
    )
    .build()?;

// Query JSON fields
let search = SearchPoint::builder()
    .collection_name("products")
    .vector(query)
    .payload_search_query(
        "SELECT * FROM products
         WHERE json_extract(metadata, '$.stock') > 0"
    )
    .build()?;

Custom Connection Timeout

use vector_xlite::customizer::SqliteConnectionCustomizer;

// Default timeout: 15 seconds
let customizer = SqliteConnectionCustomizer::new();

// Custom timeout (in milliseconds)
let customizer = SqliteConnectionCustomizer::with_busy_timeout(30000);

let pool = Pool::builder()
    .connection_customizer(customizer)
    .build(manager)?;

Performance Characteristics

Operation Complexity Notes
Insert O(log n) HNSW index update
Search O(log n) Approximate nearest neighbor
Payload Filter O(m) SQLite query on matched vectors

Optimization Tips

  1. Batch Inserts: Group multiple inserts in a single transaction
  2. Index Payload Columns: Create SQLite indexes on frequently filtered columns
  3. Tune max_elements: Set appropriately for your dataset size
  4. Use File Storage: For datasets larger than available RAM

Transaction Safety

VectorXLite provides atomic operations for data consistency:

// Both vector and payload are inserted atomically
// If either fails, the entire operation is rolled back
db.insert(point)?;

Guarantees:

  • No orphan vectors (vectors without payload)
  • No orphan payloads (payload without vectors)
  • Failed operations don't affect existing data

Use Cases

Application Description
Semantic Search Find documents by meaning, not just keywords
Recommendation Systems Similar item suggestions based on embeddings
Image Search Find visually similar images using CNN features
RAG Applications Retrieval-Augmented Generation for LLMs
Anomaly Detection Find outliers in high-dimensional data
Deduplication Identify near-duplicate content

Examples

The repository includes example applications:

# Run the basic example
cargo run -p example

# Run tests
cargo test

Architecture

┌─────────────────────────────────────────────────────────┐
│                     VectorXLite API                     │
├─────────────────────────────────────────────────────────┤
│  CollectionConfig  │  InsertPoint  │  SearchPoint      │
├─────────────────────────────────────────────────────────┤
│                    Query Planner                        │
├──────────────────────┬──────────────────────────────────┤
│    HNSW Index        │         SQLite                   │
│  (Vector Search)     │    (Payload Storage)             │
├──────────────────────┴──────────────────────────────────┤
│                 Connection Pool (r2d2)                  │
└─────────────────────────────────────────────────────────┘

Requirements

  • Rust: 1.70 or later
  • SQLite: 3.35 or later (with extension loading enabled)
  • Platforms: Linux, macOS, Windows

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Links