vector_xlite 1.4.0

VectorXLite: A fast and lightweight SQLite extension for vector search with payload support.
Documentation
<p align="center">
  <img src="https://i.imgur.com/S3PJvXm.png" alt="VectorXLite Logo" width="80"/>
</p>

<h1 align="center">VectorXLite - Embedded Mode</h1>

<p align="center">
  <strong>In-process vector database library for Rust applications</strong>
</p>

<p align="center">
  <a href="https://crates.io/crates/vector_xlite"><img src="https://img.shields.io/crates/v/vector_xlite.svg" alt="Crates.io"></a>
  <a href="https://docs.rs/vector_xlite"><img src="https://docs.rs/vector_xlite/badge.svg" alt="Documentation"></a>
  <a href="https://github.com/uttom-akash/vector-xlite/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
</p>

---

The embedded mode allows you to use VectorXLite as a direct dependency in your Rust application, providing high-performance vector search with zero network overhead.

## Overview

The embedded library combines SQLite's reliability with HNSW-based vector search, enabling:
- **Sub-millisecond similarity search** on millions of vectors
- **SQL-powered payload filtering** for hybrid queries
- **ACID transactions** with atomic operations
- **In-memory or file-backed** storage options

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
vector_xlite = { path = "../embedded/core" }  # Or from crates.io: "1.2"
r2d2 = "0.8"
r2d2_sqlite = "0.31"
rusqlite = { version = "0.37", features = ["load_extension"] }
```

## Quick Start

```rust
use vector_xlite::{VectorXLite, customizer::SqliteConnectionCustomizer, types::*};
use r2d2::Pool;
use r2d2_sqlite::SqliteConnectionManager;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create connection pool
    let manager = SqliteConnectionManager::memory();
    let pool = Pool::builder()
        .max_size(10)
        .connection_customizer(SqliteConnectionCustomizer::new())
        .build(manager)?;

    let db = VectorXLite::new(pool)?;

    // 2. Create a collection
    let config = CollectionConfigBuilder::default()
        .collection_name("products")
        .vector_dimension(384)
        .distance(DistanceFunction::Cosine)
        .payload_table_schema(
            "CREATE TABLE products (
                rowid INTEGER PRIMARY KEY,
                name TEXT NOT NULL,
                category TEXT,
                price REAL
            )"
        )
        .build()?;

    db.create_collection(config)?;

    // 2.5. Check if collection exists
    if db.collection_exists("products")? {
        println!("Collection 'products' already exists!");
    }

    // 3. Insert vectors
    let embedding = vec![0.1; 384];
    let point = InsertPoint::builder()
        .collection_name("products")
        .id(1)
        .vector(embedding)
        .payload_insert_query(
            "INSERT INTO products(rowid, name, category, price)
             VALUES (?1, 'Wireless Headphones', 'Electronics', 99.99)"
        )
        .build()?;

    db.insert(point)?;

    // 4. Search with filtering
    let query = vec![0.15; 384];
    let search = SearchPoint::builder()
        .collection_name("products")
        .vector(query)
        .top_k(10)
        .payload_search_query(
            "SELECT rowid, name, category, price
             FROM products
             WHERE category = 'Electronics' AND price < 150"
        )
        .build()?;

    let results = db.search(search)?;

    for result in results {
        println!("Found: {}", result["name"]);
    }

    Ok(())
}
```

## Core Concepts

### Collections

A collection combines a vector index with a payload table.

#### Creating Collections

```rust
let config = CollectionConfigBuilder::default()
    .collection_name("documents")
    .vector_dimension(768)           // Dimension of your embeddings
    .distance(DistanceFunction::Cosine)  // Similarity metric
    .max_elements(1_000_000)         // Capacity
    .payload_table_schema(           // SQL schema for metadata
        "CREATE TABLE documents (
            rowid INTEGER PRIMARY KEY,
            title TEXT,
            content TEXT,
            tags JSON,
            created_at TIMESTAMP
        )"
    )
    .index_file_path("/data/docs.idx")  // Optional: persist HNSW index
    .build()?;
```

#### Checking Collection Existence

Before creating or using a collection, you can check if it exists:

```rust
// Check if a collection exists
if db.collection_exists("documents")? {
    println!("Collection exists!");
} else {
    println!("Collection does not exist, creating...");
    db.create_collection(config)?;
}

// Prevent duplicate collection creation
let collection_name = "products";
if !db.collection_exists(collection_name)? {
    let config = CollectionConfigBuilder::default()
        .collection_name(collection_name)
        .vector_dimension(384)
        .build()?;

    db.create_collection(config)?;
    println!("Collection created successfully");
} else {
    println!("Collection already exists, skipping creation");
}
```

**Important Notes:**
- Collection names are case-sensitive
- Empty collection names return an error
- This check is atomic and thread-safe
- Useful for idempotent initialization code

### Distance Functions

| Function | Description | Use Case |
|----------|-------------|----------|
| `Cosine` | Cosine similarity (normalized) | Text embeddings, NLP models |
| `L2` | Euclidean distance | Image features, spatial data |
| `IP` | Inner product (dot product) | Pre-normalized vectors |

### Storage Modes

**In-Memory (Development/Testing)**
```rust
let manager = SqliteConnectionManager::memory();
```

**File-Backed (Production)**
```rust
let manager = SqliteConnectionManager::file("vectors.db");
let config = CollectionConfigBuilder::default()
    .collection_name("prod")
    .index_file_path("/data/prod.idx")  // Persistent HNSW index
    .build()?;
```

## Advanced Usage

### JSON Payloads

```rust
let config = CollectionConfigBuilder::default()
    .collection_name("products")
    .payload_table_schema(
        "CREATE TABLE products (
            rowid INTEGER PRIMARY KEY,
            metadata JSON
        )"
    )
    .build()?;

// Insert
let point = InsertPoint::builder()
    .collection_name("products")
    .id(1)
    .vector(embedding)
    .payload_insert_query(
        r#"INSERT INTO products(rowid, metadata)
           VALUES (?1, '{"tags": ["sale"], "stock": 100}')"#
    )
    .build()?;

// Query JSON fields
let search = SearchPoint::builder()
    .collection_name("products")
    .vector(query)
    .payload_search_query(
        "SELECT * FROM products
         WHERE json_extract(metadata, '$.stock') > 0"
    )
    .build()?;
```

### Atomic Operations

```rust
// Atomic batch insert using transactions
let manager = SqliteConnectionManager::file("vectors.db");
let pool = Pool::builder()
    .connection_customizer(SqliteConnectionCustomizer::new())
    .build(manager)?;

let db = VectorXLite::new(pool.clone())?;

// All inserts are atomic - either all succeed or all fail
for (id, vector) in vectors.iter().enumerate() {
    db.insert(InsertPoint::builder()
        .collection_name("batch")
        .id(id as u64)
        .vector(vector.clone())
        .payload_insert_query(&format!(
            "INSERT INTO batch(rowid, data) VALUES (?1, 'Item {}')", id
        ))
        .build()?
    )?;
}
```

### Connection Pooling

```rust
use vector_xlite::customizer::SqliteConnectionCustomizer;

// Default timeout: 15 seconds
let customizer = SqliteConnectionCustomizer::new();

// Custom timeout: 30 seconds
let customizer = SqliteConnectionCustomizer::with_busy_timeout(30000);

let pool = Pool::builder()
    .max_size(15)  // Max concurrent connections
    .connection_customizer(customizer)
    .build(manager)?;
```

## Examples

Run the included examples:

```bash
# Run all examples
cargo run -p embedded-examples --release

# Run specific example
cd embedded/examples/rust
cargo run --release
```

## Performance Tips

1. **Use file-backed storage** for datasets larger than RAM
2. **Tune `max_elements`** to your expected dataset size
3. **Index payload columns** that you frequently filter on:
   ```sql
   CREATE INDEX idx_category ON products(category);
   ```
4. **Batch operations** for better throughput
5. **Persistent HNSW index** with `index_file_path` for faster restarts

## API Reference

### Main Types

- **`VectorXLite`** - Main database interface
  - `create_collection(config)` - Create a new collection
  - `collection_exists(name)` - Check if a collection exists
  - `insert(point)` - Insert a vector with payload
  - `delete(point)` - Delete a vector from a collection
  - `delete_collection(collection)` - Delete an entire collection
  - `search(query)` - Search for similar vectors
- **`CollectionConfigBuilder`** - Configure collections
- **`InsertPoint`** - Insert operations
- **`SearchPoint`** - Search operations
- **`DistanceFunction`** - Similarity metrics (Cosine, L2, IP)

### Full documentation

See the [main README](../README.md) for comprehensive API documentation.

## Architecture

```
┌─────────────────────────────────────────┐
│          Your Application               │
├─────────────────────────────────────────┤
│         VectorXLite API                 │
├──────────────────┬──────────────────────┤
│   HNSW Index     │      SQLite          │
│ (Vector Search)  │  (Payload Storage)   │
├──────────────────┴──────────────────────┤
│         Connection Pool (r2d2)          │
└─────────────────────────────────────────┘
```

## Use Cases

- **RAG Applications** - Embed documents for LLM context retrieval
- **Semantic Search** - Find similar content by meaning
- **Recommendation Systems** - Similar item suggestions
- **Image Search** - Visual similarity using CNN embeddings
- **Anomaly Detection** - Outlier detection in high-dimensional data
- **Deduplication** - Find near-duplicate content

## Testing

Run the comprehensive test suite:

```bash
# Run all tests
cargo test -p vector_xlite_tests --release

# Run specific test file
cargo test -p vector_xlite_tests --test snapshot_tests --release
```

## Next Steps

- **Standalone Mode**: Need multi-language clients? See [../standalone/]../standalone/
- **Distributed Mode**: Need high availability? See [../distributed/]../distributed/
- **Documentation**: Full API docs at [docs.rs/vector_xlite]https://docs.rs/vector_xlite

## License

MIT OR Apache-2.0