Overview
VectorXLite is a high-performance, embeddable vector database built on SQLite. It combines the power of HNSW-based approximate nearest neighbor search with the flexibility of SQL for metadata filtering, making it ideal for AI/ML applications, semantic search, and recommendation systems.
Why VectorXLite?
| Feature | Benefit |
|---|---|
| Embedded Architecture | No separate server required - runs in-process |
| SQLite Foundation | Battle-tested storage with ACID guarantees |
| HNSW Index | Sub-millisecond similarity search on millions of vectors |
| SQL Filtering | Full SQL support for complex payload queries |
| Atomic Operations | Transaction support for data consistency |
| Zero Configuration | Works out of the box with sensible defaults |
Features
- Multiple Distance Functions: Cosine similarity, L2 (Euclidean), and Inner Product
- Flexible Dimensions: Support for vectors of any dimension
- Rich Payload Support: Store and query arbitrary metadata alongside vectors
- Hybrid Search: Combine vector similarity with SQL WHERE clauses
- Connection Pooling: Built-in r2d2 pool support for concurrent access
- Persistent Storage: File-backed or in-memory operation modes
- Type-Safe API: Builder pattern with compile-time validation
Installation
Add VectorXLite to your Cargo.toml:
[]
= "0.1"
= "0.8"
= "0.24"
Quick Start
use ;
use Pool;
use SqliteConnectionManager;
API Reference
VectorXLite
The main entry point for all database operations.
// Create from connection pool
let db = new?;
// Available operations
db.create_collection?; // Create a new collection
db.insert?; // Insert a vector with payload
db.search?; // Perform similarity search
CollectionConfigBuilder
Configure a new vector collection.
| Method | Type | Description |
|---|---|---|
collection_name |
&str |
Unique identifier for the collection |
vector_dimension |
u16 |
Number of dimensions (default: 3) |
distance |
DistanceFunction |
Similarity metric (default: Cosine) |
max_elements |
usize |
Maximum vectors (default: 100,000) |
payload_table_schema |
&str |
SQL CREATE TABLE statement |
index_file_path |
&str |
Path for persistent HNSW index |
let config = default
.collection_name
.vector_dimension
.distance
.max_elements
.payload_table_schema
.index_file_path
.build?;
InsertPoint
Insert vectors with associated metadata.
| Method | Type | Description |
|---|---|---|
collection_name |
&str |
Target collection |
id |
u64 |
Unique vector identifier |
vector |
Vec<f32> |
The embedding vector |
payload_insert_query |
&str |
SQL INSERT statement (use ?1 for rowid) |
let point = builder
.collection_name
.id
.vector
.payload_insert_query
.build?;
SearchPoint
Configure similarity search queries.
| Method | Type | Description |
|---|---|---|
collection_name |
&str |
Collection to search |
vector |
Vec<f32> |
Query vector |
top_k |
i32 |
Number of results (default: 10) |
payload_search_query |
&str |
SQL SELECT for payload filtering |
let search = builder
.collection_name
.vector
.top_k
.payload_search_query
.build?;
Distance Functions
| Function | Description | Best For |
|---|---|---|
Cosine |
Cosine similarity (normalized) | Text embeddings, NLP |
L2 |
Euclidean distance | Image features, spatial data |
IP |
Inner product (dot product) | When vectors are pre-normalized |
Storage Modes
In-Memory (Development/Testing)
let manager = memory;
let pool = builder
.connection_customizer
.build?;
File-Backed (Production)
let manager = file;
let pool = builder
.connection_customizer
.build?;
// With persistent HNSW index
let config = default
.collection_name
.index_file_path
// ... other config
.build?;
Advanced Usage
Complex Payload Queries with JOINs
// Create related tables
let author_table = "CREATE TABLE authors (id INTEGER PRIMARY KEY, name TEXT)";
let book_table = "CREATE TABLE books (
rowid INTEGER PRIMARY KEY,
author_id INTEGER,
title TEXT,
FOREIGN KEY (author_id) REFERENCES authors(id)
)";
// Search with JOIN
let search = builder
.collection_name
.vector
.top_k
.payload_search_query
.build?;
JSON Payload Support
let config = default
.collection_name
.payload_table_schema
.build?;
// Insert with JSON
let point = builder
.collection_name
.id
.vector
.payload_insert_query
.build?;
// Query JSON fields
let search = builder
.collection_name
.vector
.payload_search_query
.build?;
Custom Connection Timeout
use SqliteConnectionCustomizer;
// Default timeout: 15 seconds
let customizer = new;
// Custom timeout (in milliseconds)
let customizer = with_busy_timeout;
let pool = builder
.connection_customizer
.build?;
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Insert | O(log n) | HNSW index update |
| Search | O(log n) | Approximate nearest neighbor |
| Payload Filter | O(m) | SQLite query on matched vectors |
Optimization Tips
- Batch Inserts: Group multiple inserts in a single transaction
- Index Payload Columns: Create SQLite indexes on frequently filtered columns
- Tune
max_elements: Set appropriately for your dataset size - Use File Storage: For datasets larger than available RAM
Transaction Safety
VectorXLite provides atomic operations for data consistency:
// Both vector and payload are inserted atomically
// If either fails, the entire operation is rolled back
db.insert?;
Guarantees:
- No orphan vectors (vectors without payload)
- No orphan payloads (payload without vectors)
- Failed operations don't affect existing data
Use Cases
| Application | Description |
|---|---|
| Semantic Search | Find documents by meaning, not just keywords |
| Recommendation Systems | Similar item suggestions based on embeddings |
| Image Search | Find visually similar images using CNN features |
| RAG Applications | Retrieval-Augmented Generation for LLMs |
| Anomaly Detection | Find outliers in high-dimensional data |
| Deduplication | Identify near-duplicate content |
Examples
The repository includes example applications:
# Run the basic example
# Run tests
Architecture
┌─────────────────────────────────────────────────────────┐
│ VectorXLite API │
├─────────────────────────────────────────────────────────┤
│ CollectionConfig │ InsertPoint │ SearchPoint │
├─────────────────────────────────────────────────────────┤
│ Query Planner │
├──────────────────────┬──────────────────────────────────┤
│ HNSW Index │ SQLite │
│ (Vector Search) │ (Payload Storage) │
├──────────────────────┴──────────────────────────────────┤
│ Connection Pool (r2d2) │
└─────────────────────────────────────────────────────────┘
Requirements
- Rust: 1.70 or later
- SQLite: 3.35 or later (with extension loading enabled)
- Platforms: Linux, macOS, Windows
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.