ruve-db 0.1.1

A hybrid vector and full-text search database with HNSW approximate nearest-neighbour indexing and BM25
Documentation

RuVe

A hybrid vector + full-text search database written in Rust.

RuVe combines an HNSW approximate nearest-neighbour graph with a BM25 keyword index.


Features

  • HNSW index — sub-linear approximate nearest-neighbour search with configurable exploration factor
  • BM25 index — IDF-weighted full-text ranking with tokenisation and stop-word filtering
  • Append-only binary storage — fast sequential writes, O(1) random-access reads via stored offsets
  • Fully persistent — all data, graph edges, and indices are written to disk; no in-memory-only state

Installation

Add RuVe to your Cargo.toml:

[dependencies]
ruve-db = "0.1"

The default feature set includes the embedder (OpenAI / Ollama). If you only need the core library and will supply your own vectors:

[dependencies]
ruve-db = { version = "0.1", default-features = false }

Library Usage

Insert and search with your own vectors

use ruve::database::Database;

let mut db = Database::new(
    "data/data.bin",     // binary record store
    "data/index.json",   // UUID → file-offset map
    "data/bm25.json",    // BM25 term statistics
    "data/hnsw.json",    // HNSW graph metadata
    "data/graph.bin",    // HNSW edge data
);

// Insert with an auto-generated UUID
let vector: Vec<f32> = vec![0.1, 0.2, 0.3, /* ... */];
db.insert_raw(vector, "The quick brown fox", None);

// Insert with a custom key
let vector2: Vec<f32> = vec![0.4, 0.5, 0.6, /* ... */];
db.insert_raw(vector2, "Jumps over the lazy dog", Some("my-doc-id"));

// HNSW approximate nearest-neighbour search
// ef is the exploration factor — higher = better recall, slower
let query = vec![0.15, 0.25, 0.35];
let records = db.search_hnsw(&query, 20);
for record in &records {
    println!("{}{:?}", record.id, record.metadata);
}

// BM25 full-text search
let results = db.text_search("quick fox", 5);
for record in &results {
    println!("{}{:?}", record.id, record.metadata);
}

// Delete by id
db.delete("my-doc-id");

// Wipe everything
db.wipe();

Embedder (feature = "embedder")

Use RuVe's built-in embedding backends to turn text into vectors automatically.

OpenAI (requires OPENAI_API_KEY in your environment or a .env file):

use ruve::embedder::Embedder;

let embedder = Embedder::openai();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);

Ollama (requires ollama running locally with nomic-embed-text pulled):

let embedder = Embedder::ollama();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);

CLI

An interactive REPL for exploring your database directly from the terminal.

# default features already include embedder + cli
cargo run --bin ruve

# or explicitly
cargo run --bin ruve --features cli
RuVe v0.1.0 — type help for available commands, quit to exit
ruve> insert The quick brown fox jumps over the lazy dog
inserted
ruve> search text quick fox 3
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> search vec The quick brown fox 3
query vector dim: 3072
0.9821 | dim=3072 | 01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> list
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> delete 01980...
deleted
ruve> wipe
wiped

Commands

Command Description
insert <text> Embed the text and insert a record
insert raw [1.0, 2.0, ...] <text> Insert with a pre-computed vector
search vec <query> <k> Embed the query and run HNSW vector search
search text <query> <k> BM25 full-text search
delete <id> Delete a record by UUID
wipe Delete all records and indices
load <filename> Batch-embed and index every line from books/<filename>
list List all stored records
help Show this help text
quit / exit Exit the REPL

Benchmark

Measure insert throughput, vector search latency, text search latency, and Recall@k against brute-force ground truth.

# run the two smallest scenarios by default
cargo run --bin benchmark

# pick specific scenarios
cargo run --bin benchmark -- xs small medium large highdim
Scenario Nodes Dims
xxs 200 128
xs 1 K 128
small 10 K 128
medium 50 K 128
large 100 K 128
highdim 10 K 768

Results — small (10 K × 128d, k=10)

Operation Throughput p50 p95 p99
Insert 54 ops/s
HNSW vector search 125 qps 7.89 ms 9.27 ms 9.73 ms
Brute-force vector search 13 qps 79.20 ms 83.07 ms 85.80 ms
BM25 text search 1 011 qps 1.05 ms 1.28 ms 2.03 ms

Visualizer

An interactive 3-D viewer for the HNSW graph. Click any node to inspect it.

HNSW graph visualizer

# populate a small graph with the benchmark, then open the viewer
cargo run --release --bin benchmark -- xxs
cargo run --release --bin visualize

The viewer opens as a self-contained HTML file in your browser. You can also pass a specific scenario or point it at any CLI database directory:

# different benchmark scenario
cargo run --release --bin visualize -- xs

# a database you built through the CLI (stored in ./data)
cargo run --release --bin visualize -- ./data

Running tests

cargo test

License

MIT