sqlite-vector-rs 0.2.0

SQLite extension providing PGVector-like native vector types with HNSW indexing
Documentation

sqlite-vector-rs

A Rust SQLite extension providing PGVector-style typed vector columns with HNSW approximate nearest-neighbor search, scalar SQL functions, and Arrow IPC bulk I/O.

Vectors are stored as raw typed byte blobs inside SQLite tables — no external service required. The HNSW index (powered by usearch) lives in a shadow table and is persisted across connections.

Features

  • 6 vector typesfloat2 (f16), float4 (f32), float8 (f64), int1 (i8), int2 (i16), int4 (i32)
  • 3 distance metrics — L2 (squared Euclidean), cosine, inner product
  • HNSW approximate nearest-neighbor search via usearch with configurable parameters
  • Arrow IPC bulk import/export for efficient batch operations
  • Full virtual table with INSERT, UPDATE, DELETE, and transaction rollback
  • Optional metadata columns alongside vectors (TEXT, INTEGER, REAL, BLOB)
  • Works three ways — loadable SQLite extension, Rust library, or standalone CLI

Quick Start

Build

cargo build --release

This produces target/release/libsqlite_vector_rs.so (Linux), .dylib (macOS), or .dll (Windows).

Load into any SQLite client

.load target/release/libsqlite_vector_rs

CREATE VIRTUAL TABLE embeddings USING vector(
    dim=384,
    type=float4,
    metric=cosine
);

INSERT INTO embeddings(vector)
VALUES (vector_from_json('[0.1, 0.2, 0.3, ...]', 'float4'));

SELECT rowid, distance
FROM embeddings
WHERE knn_match(distance, vector_from_json('[0.15, 0.25, 0.35, ...]', 'float4'))
LIMIT 10;

Use from Rust

Add to your Cargo.toml:

[dependencies]
sqlite-vector-rs = { path = "../sqlite-vector-rs", features = ["library"] }
rusqlite = { version = "0.39", features = ["bundled"] }
use rusqlite::Connection;

let conn = Connection::open("vectors.db")?;
sqlite_vector_rs::register(&conn)?;

conn.execute_batch("
    CREATE VIRTUAL TABLE embeddings USING vector(
        dim=3, type=float4, metric=cosine
    );
")?;

conn.execute(
    "INSERT INTO embeddings(vector) VALUES (vector_from_json(?, 'float4'))",
    ["[1.0, 0.0, 0.0]"],
)?;

Standalone CLI

cargo build --features library --bin sqlite3
./target/debug/sqlite3 my_vectors.db
sqlite3-vector v0.1.0 (SQLite 3.49.1)
Enter ".help" for usage hints.
sqlite3-vector> CREATE VIRTUAL TABLE docs USING vector(dim=3, type=float4, metric=cosine);
sqlite3-vector> INSERT INTO docs(vector) VALUES (vector_from_json('[1,0,0]', 'float4'));
sqlite3-vector> SELECT rowid, distance FROM docs
          ...> WHERE knn_match(distance, vector_from_json('[0.9,0.1,0]', 'float4'))
          ...> LIMIT 5;
rowid  distance
-----  --------
1      0.006116

SQL Reference

CREATE VIRTUAL TABLE

CREATE VIRTUAL TABLE <name> USING vector(
    dim=<integer>,               -- vector dimension (required)
    type=<vector_type>,          -- element type (required)
    metric=<distance_metric>,    -- distance metric (required)
    m=<integer>,                 -- HNSW M parameter (default: 16)
    ef_construction=<integer>,   -- HNSW build quality (default: 128)
    ef_search=<integer>,         -- HNSW query quality (default: 64)
    metadata='col1 TYPE, ...'    -- optional metadata columns
);

Vector types: float2, float4, float8, int1, int2, int4

Distance metrics: l2, cosine, inner_product

KNN Search

SELECT rowid, distance [, metadata_cols...]
FROM <table>
WHERE knn_match(distance, <query_vector_blob>)
LIMIT <k>;

The distance column is a hidden virtual column that returns the distance between each stored vector and the query. knn_match activates the HNSW index for efficient approximate search.

Scalar Functions

Function Description
vector_from_json(json, type) Parse a JSON array into a vector blob
vector_to_json(blob, type) Convert a vector blob to a JSON array string
vector_distance(blob_a, blob_b, metric, type) Compute distance between two vectors
vector_dims(blob, type) Return the number of dimensions
vector_rebuild_index(table, type, metric) Rebuild the HNSW index from shadow data
vector_export_arrow(table, type) Export all vectors as an Arrow IPC blob
vector_insert_arrow(table, type, ipc_blob) Import vectors from an Arrow IPC blob

Metadata Columns

CREATE VIRTUAL TABLE docs USING vector(
    dim=384,
    type=float4,
    metric=cosine,
    metadata='title TEXT, source TEXT, page INTEGER'
);

INSERT INTO docs(vector, title, source, page)
VALUES (vector_from_json('[...]', 'float4'), 'Chapter 1', 'book.pdf', 42);

SELECT rowid, distance, title, page
FROM docs
WHERE knn_match(distance, vector_from_json('[...]', 'float4'))
LIMIT 5;

Arrow IPC Bulk Operations

Export all vectors to an Arrow IPC stream, then re-import into another table:

-- Export
SELECT vector_export_arrow('source_table', 'float4');

-- Import (returns row count)
SELECT vector_insert_arrow('dest_table', 'float4', <ipc_blob>);

-- Rebuild the HNSW index after bulk import
SELECT vector_rebuild_index('dest_table', 'float4', 'cosine');

Architecture

┌─────────────────────────────────────────────────┐
│  SQLite                                          │
│  ┌────────────────────────────────────────────┐  │
│  │  vector virtual table module               │  │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐ │  │
│  │  │ Config   │  │ Cursor   │  │ Txn      │ │  │
│  │  │ parsing  │  │ scan/KNN │  │ rollback │ │  │
│  │  └──────────┘  └──────────┘  └──────────┘ │  │
│  └──────────────────┬─────────────────────────┘  │
│                     │                             │
│  ┌──────────────────▼─────────────────────────┐  │
│  │  Shadow tables                              │  │
│  │  {name}_data  → rowid, vector, metadata     │  │
│  │  {name}_index → serialized HNSW graph       │  │
│  └─────────────────────────────────────────────┘  │
│                                                   │
│  ┌─────────────────────────────────────────────┐  │
│  │  Scalar functions                            │  │
│  │  vector_from_json, vector_distance, etc.     │  │
│  └─────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────┘
         │
         ▼
┌────────────────┐     ┌────────────────┐
│  usearch HNSW  │     │  Arrow IPC     │
│  (in-memory)   │     │  (bulk I/O)    │
└────────────────┘     └────────────────┘

Dependencies

Crate Purpose
sqlite3_ext SQLite extension + virtual table API
usearch HNSW approximate nearest-neighbor index
arrow-* (v58) Arrow IPC stream encoding for bulk I/O
half IEEE 754 half-precision (f16) support
bytemuck Zero-copy byte casting
serde_json JSON vector parsing
rusqlite Library-mode API (optional, library feature)

Testing

# Build the extension first (required for integration tests)
cargo build

# Run all 271 tests
cargo test

The test suite includes unit tests for every module, integration tests for all SQL interfaces, a Shakespeare PDF ingestion pipeline, and GGUF embedding tests (which download a small model on first run).

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.