RuVe

A hybrid vector + full-text search database written in Rust.

RuVe combines an HNSW approximate nearest-neighbour graph with a BM25 keyword index.

Features

HNSW index — sub-linear approximate nearest-neighbour search with configurable exploration factor
BM25 index — IDF-weighted full-text ranking with tokenisation and stop-word filtering
Append-only binary storage — fast sequential writes, O(1) random-access reads via stored offsets
Fully persistent — all data, graph edges, and indices are written to disk; no in-memory-only state

Installation

Add RuVe to your Cargo.toml:

[dependencies]
ruve-db = "0.1"

The default feature set includes the embedder (OpenAI / Ollama). If you only need the core library and will supply your own vectors:

[dependencies]
ruve-db = { version = "0.1", default-features = false }

Library Usage

Insert and search with your own vectors

use ruve::database::Database;

let mut db = Database::new(
    "data/data.bin",     // binary record store
    "data/index.json",   // UUID → file-offset map
    "data/bm25.json",    // BM25 term statistics
    "data/hnsw.json",    // HNSW graph metadata
    "data/graph.bin",    // HNSW edge data
);

// Insert with an auto-generated UUID
let vector: Vec<f32> = vec![0.1, 0.2, 0.3, /* ... */];
db.insert_raw(vector, "The quick brown fox", None);

// Insert with a custom key
let vector2: Vec<f32> = vec![0.4, 0.5, 0.6, /* ... */];
db.insert_raw(vector2, "Jumps over the lazy dog", Some("my-doc-id"));

// HNSW approximate nearest-neighbour search
// ef is the exploration factor — higher = better recall, slower
let query = vec![0.15, 0.25, 0.35];
let records = db.search_hnsw(&query, 20);
for record in &records {
    println!("{} — {:?}", record.id, record.metadata);
}

// BM25 full-text search
let results = db.text_search("quick fox", 5);
for record in &results {
    println!("{} — {:?}", record.id, record.metadata);
}

// Delete by id
db.delete("my-doc-id");

// Wipe everything
db.wipe();

Embedder (feature = `"embedder"`)

Use RuVe's built-in embedding backends to turn text into vectors automatically.

OpenAI (requires OPENAI_API_KEY in your environment or a .env file):

use ruve::embedder::Embedder;

let embedder = Embedder::openai();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);

Ollama (requires ollama running locally with nomic-embed-text pulled):

let embedder = Embedder::ollama();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);

CLI

An interactive REPL for exploring your database directly from the terminal.

# default features already include embedder + cli
cargo run --bin ruve

# or explicitly
cargo run --bin ruve --features cli

RuVe v0.1.0 — type help for available commands, quit to exit
ruve> insert The quick brown fox jumps over the lazy dog
inserted
ruve> search text quick fox 3
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> search vec The quick brown fox 3
query vector dim: 3072
0.9821 | dim=3072 | 01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> list
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> delete 01980...
deleted
ruve> wipe
wiped

Commands

Command	Description
`insert <text>`	Embed the text and insert a record
`insert raw [1.0, 2.0, ...] <text>`	Insert with a pre-computed vector
`search vec <query> <k>`	Embed the query and run HNSW vector search
`search text <query> <k>`	BM25 full-text search
`delete <id>`	Delete a record by UUID
`wipe`	Delete all records and indices
`load <filename>`	Batch-embed and index every line from `books/<filename>`
`list`	List all stored records
`help`	Show this help text
`quit` / `exit`	Exit the REPL

Benchmark

Measure insert throughput, vector search latency, text search latency, and Recall@k against brute-force ground truth.

# run the two smallest scenarios by default
cargo run --bin benchmark

# pick specific scenarios
cargo run --bin benchmark -- xs small medium large highdim

Scenario	Nodes	Dims
`xxs`	200	128
`xs`	1 K	128
`small`	10 K	128
`medium`	50 K	128
`large`	100 K	128
`highdim`	10 K	768

Results — `small` (10 K × 128d, k=10)

Operation	Throughput	p50	p95	p99
Insert	54 ops/s	—	—	—
HNSW vector search	125 qps	7.89 ms	9.27 ms	9.73 ms
Brute-force vector search	13 qps	79.20 ms	83.07 ms	85.80 ms
BM25 text search	1 011 qps	1.05 ms	1.28 ms	2.03 ms

Visualizer

An interactive 3-D viewer for the HNSW graph. Click any node to inspect it.

HNSW graph visualizer

# populate a small graph with the benchmark, then open the viewer
cargo run --release --bin benchmark -- xxs
cargo run --release --bin visualize

The viewer opens as a self-contained HTML file in your browser. You can also pass a specific scenario or point it at any CLI database directory:

# different benchmark scenario
cargo run --release --bin visualize -- xs

# a database you built through the CLI (stored in ./data)
cargo run --release --bin visualize -- ./data

Running tests

cargo test

License

MIT

ruve-db 0.1.1