vector_xlite 1.1.0

Overview

VectorXLite is a high-performance, embeddable vector database built on SQLite. It combines the power of HNSW-based approximate nearest neighbor search with the flexibility of SQL for metadata filtering, making it ideal for AI/ML applications, semantic search, and recommendation systems.

Why VectorXLite?

Feature	Benefit
Embedded Architecture	No separate server required - runs in-process
SQLite Foundation	Battle-tested storage with ACID guarantees
HNSW Index	Sub-millisecond similarity search on millions of vectors
SQL Filtering	Full SQL support for complex payload queries
Atomic Operations	Transaction support for data consistency
Zero Configuration	Works out of the box with sensible defaults

Features

Multiple Distance Functions: Cosine similarity, L2 (Euclidean), and Inner Product
Flexible Dimensions: Support for vectors of any dimension
Rich Payload Support: Store and query arbitrary metadata alongside vectors
Hybrid Search: Combine vector similarity with SQL WHERE clauses
Connection Pooling: Built-in r2d2 pool support for concurrent access
Persistent Storage: File-backed or in-memory operation modes
Type-Safe API: Builder pattern with compile-time validation

Installation

Add VectorXLite to your Cargo.toml:

[dependencies]
vector_xlite = "0.1"
r2d2 = "0.8"
r2d2_sqlite = "0.24"

Quick Start

use vector_xlite::{VectorXLite, customizer::SqliteConnectionCustomizer, types::*};
use r2d2::Pool;
use r2d2_sqlite::SqliteConnectionManager;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create connection pool
    let manager = SqliteConnectionManager::memory();
    let pool = Pool::builder()
        .max_size(10)
        .connection_customizer(SqliteConnectionCustomizer::new())
        .build(manager)?;

    let db = VectorXLite::new(pool)?;

    // 2. Create a collection
    let config = CollectionConfigBuilder::default()
        .collection_name("products")
        .vector_dimension(384)  // e.g., sentence-transformers output
        .distance(DistanceFunction::Cosine)
        .payload_table_schema(
            "CREATE TABLE products (
                rowid INTEGER PRIMARY KEY,
                name TEXT NOT NULL,
                category TEXT,
                price REAL
            )"
        )
        .build()?;

    db.create_collection(config)?;

    // 3. Insert vectors with metadata
    let embedding = vec![0.1, 0.2, 0.3, /* ... 384 dimensions */];

    let point = InsertPoint::builder()
        .collection_name("products")
        .id(1)
        .vector(embedding)
        .payload_insert_query(
            "INSERT INTO products(rowid, name, category, price)
             VALUES (?1, 'Wireless Headphones', 'Electronics', 99.99)"
        )
        .build()?;

    db.insert(point)?;

    // 4. Search with payload filtering
    let query_vector = vec![0.15, 0.25, 0.35, /* ... */];

    let search = SearchPoint::builder()
        .collection_name("products")
        .vector(query_vector)
        .top_k(10)
        .payload_search_query(
            "SELECT rowid, name, category, price
             FROM products
             WHERE category = 'Electronics' AND price < 150"
        )
        .build()?;

    let results = db.search(search)?;

    for result in results {
        println!("Found: {} - ${}", result["name"], result["price"]);
    }

    Ok(())
}

API Reference

VectorXLite

The main entry point for all database operations.

// Create from connection pool
let db = VectorXLite::new(pool)?;

// Available operations
db.create_collection(config)?;  // Create a new collection
db.insert(point)?;              // Insert a vector with payload
db.search(search_point)?;       // Perform similarity search

CollectionConfigBuilder

Configure a new vector collection.

Method	Type	Description
`collection_name`	`&str`	Unique identifier for the collection
`vector_dimension`	`u16`	Number of dimensions (default: 3)
`distance`	`DistanceFunction`	Similarity metric (default: Cosine)
`max_elements`	`usize`	Maximum vectors (default: 100,000)
`payload_table_schema`	`&str`	SQL CREATE TABLE statement
`index_file_path`	`&str`	Path for persistent HNSW index

let config = CollectionConfigBuilder::default()
    .collection_name("embeddings")
    .vector_dimension(768)
    .distance(DistanceFunction::Cosine)
    .max_elements(1_000_000)
    .payload_table_schema("CREATE TABLE embeddings (rowid INTEGER PRIMARY KEY, data TEXT)")
    .index_file_path("/data/embeddings.idx")
    .build()?;

InsertPoint

Insert vectors with associated metadata.

Method	Type	Description
`collection_name`	`&str`	Target collection
`id`	`u64`	Unique vector identifier
`vector`	`Vec<f32>`	The embedding vector
`payload_insert_query`	`&str`	SQL INSERT statement (use `?1` for rowid)

let point = InsertPoint::builder()
    .collection_name("documents")
    .id(42)
    .vector(embedding)
    .payload_insert_query("INSERT INTO documents(rowid, title) VALUES (?1, 'My Doc')")
    .build()?;

SearchPoint

Configure similarity search queries.

Method	Type	Description
`collection_name`	`&str`	Collection to search
`vector`	`Vec<f32>`	Query vector
`top_k`	`i32`	Number of results (default: 10)
`payload_search_query`	`&str`	SQL SELECT for payload filtering

let search = SearchPoint::builder()
    .collection_name("documents")
    .vector(query_embedding)
    .top_k(20)
    .payload_search_query("SELECT * FROM documents WHERE status = 'active'")
    .build()?;

Distance Functions

Function	Description	Best For
`Cosine`	Cosine similarity (normalized)	Text embeddings, NLP
`L2`	Euclidean distance	Image features, spatial data
`IP`	Inner product (dot product)	When vectors are pre-normalized

Storage Modes

In-Memory (Development/Testing)

let manager = SqliteConnectionManager::memory();
let pool = Pool::builder()
    .connection_customizer(SqliteConnectionCustomizer::new())
    .build(manager)?;

File-Backed (Production)

let manager = SqliteConnectionManager::file("vectors.db");
let pool = Pool::builder()
    .connection_customizer(SqliteConnectionCustomizer::new())
    .build(manager)?;

// With persistent HNSW index
let config = CollectionConfigBuilder::default()
    .collection_name("production")
    .index_file_path("/data/production.idx")
    // ... other config
    .build()?;

Advanced Usage

Complex Payload Queries with JOINs

// Create related tables
let author_table = "CREATE TABLE authors (id INTEGER PRIMARY KEY, name TEXT)";
let book_table = "CREATE TABLE books (
    rowid INTEGER PRIMARY KEY,
    author_id INTEGER,
    title TEXT,
    FOREIGN KEY (author_id) REFERENCES authors(id)
)";

// Search with JOIN
let search = SearchPoint::builder()
    .collection_name("books")
    .vector(query)
    .top_k(10)
    .payload_search_query(
        "SELECT b.rowid, b.title, a.name as author
         FROM books b
         JOIN authors a ON a.id = b.author_id
         WHERE a.name LIKE '%Smith%'"
    )
    .build()?;

JSON Payload Support

let config = CollectionConfigBuilder::default()
    .collection_name("products")
    .payload_table_schema(
        "CREATE TABLE products (
            rowid INTEGER PRIMARY KEY,
            metadata JSON
        )"
    )
    .build()?;

// Insert with JSON
let point = InsertPoint::builder()
    .collection_name("products")
    .id(1)
    .vector(embedding)
    .payload_insert_query(
        r#"INSERT INTO products(rowid, metadata)
           VALUES (?1, '{"tags": ["sale", "new"], "stock": 100}')"#
    )
    .build()?;

// Query JSON fields
let search = SearchPoint::builder()
    .collection_name("products")
    .vector(query)
    .payload_search_query(
        "SELECT * FROM products
         WHERE json_extract(metadata, '$.stock') > 0"
    )
    .build()?;

Custom Connection Timeout

use vector_xlite::customizer::SqliteConnectionCustomizer;

// Default timeout: 15 seconds
let customizer = SqliteConnectionCustomizer::new();

// Custom timeout (in milliseconds)
let customizer = SqliteConnectionCustomizer::with_busy_timeout(30000);

let pool = Pool::builder()
    .connection_customizer(customizer)
    .build(manager)?;

Performance Characteristics

Operation	Complexity	Notes
Insert	O(log n)	HNSW index update
Search	O(log n)	Approximate nearest neighbor
Payload Filter	O(m)	SQLite query on matched vectors

Optimization Tips

Batch Inserts: Group multiple inserts in a single transaction
Index Payload Columns: Create SQLite indexes on frequently filtered columns
Tune max_elements: Set appropriately for your dataset size
Use File Storage: For datasets larger than available RAM

Transaction Safety

VectorXLite provides atomic operations for data consistency:

// Both vector and payload are inserted atomically
// If either fails, the entire operation is rolled back
db.insert(point)?;

Guarantees:

No orphan vectors (vectors without payload)
No orphan payloads (payload without vectors)
Failed operations don't affect existing data

Use Cases

Application	Description
Semantic Search	Find documents by meaning, not just keywords
Recommendation Systems	Similar item suggestions based on embeddings
Image Search	Find visually similar images using CNN features
RAG Applications	Retrieval-Augmented Generation for LLMs
Anomaly Detection	Find outliers in high-dimensional data
Deduplication	Identify near-duplicate content

Examples

The repository includes example applications:

# Run the basic example
cargo run -p example

# Run tests
cargo test

Architecture

┌─────────────────────────────────────────────────────────┐
│                     VectorXLite API                     │
├─────────────────────────────────────────────────────────┤
│  CollectionConfig  │  InsertPoint  │  SearchPoint      │
├─────────────────────────────────────────────────────────┤
│                    Query Planner                        │
├──────────────────────┬──────────────────────────────────┤
│    HNSW Index        │         SQLite                   │
│  (Vector Search)     │    (Payload Storage)             │
├──────────────────────┴──────────────────────────────────┤
│                 Connection Pool (r2d2)                  │
└─────────────────────────────────────────────────────────┘

Requirements

Rust: 1.70 or later
SQLite: 3.35 or later (with extension loading enabled)
Platforms: Linux, macOS, Windows

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.