# RuVe
A hybrid vector + full-text search database written in Rust.
RuVe combines an **HNSW** approximate nearest-neighbour graph with a **BM25** keyword index.
---
## Features
- **HNSW index** — sub-linear approximate nearest-neighbour search with configurable exploration factor
- **BM25 index** — IDF-weighted full-text ranking with tokenisation and stop-word filtering
- **Append-only binary storage** — fast sequential writes, O(1) random-access reads via stored offsets
- **Fully persistent** — all data, graph edges, and indices are written to disk; no in-memory-only state
---
## Installation
Add RuVe to your `Cargo.toml`:
```toml
[dependencies]
ruve-db = "0.1"
```
The default feature set includes the embedder (OpenAI / Ollama). If you only need the core library and will supply your own vectors:
```toml
[dependencies]
ruve-db = { version = "0.1", default-features = false }
```
---
## Library Usage
### Insert and search with your own vectors
```rust
use ruve::database::Database;
let mut db = Database::new(
"data/data.bin", // binary record store
"data/index.json", // UUID → file-offset map
"data/bm25.json", // BM25 term statistics
"data/hnsw.json", // HNSW graph metadata
"data/graph.bin", // HNSW edge data
);
// Insert with an auto-generated UUID
let vector: Vec<f32> = vec![0.1, 0.2, 0.3, /* ... */];
db.insert_raw(vector, "The quick brown fox", None);
// Insert with a custom key
let vector2: Vec<f32> = vec![0.4, 0.5, 0.6, /* ... */];
db.insert_raw(vector2, "Jumps over the lazy dog", Some("my-doc-id"));
// HNSW approximate nearest-neighbour search
// ef is the exploration factor — higher = better recall, slower
let query = vec![0.15, 0.25, 0.35];
let records = db.search_hnsw(&query, 20);
for record in &records {
println!("{} — {:?}", record.id, record.metadata);
}
// BM25 full-text search
let results = db.text_search("quick fox", 5);
for record in &results {
println!("{} — {:?}", record.id, record.metadata);
}
// Delete by id
db.delete("my-doc-id");
// Wipe everything
db.wipe();
```
### Embedder (feature = `"embedder"`)
Use RuVe's built-in embedding backends to turn text into vectors automatically.
**OpenAI** (requires `OPENAI_API_KEY` in your environment or a `.env` file):
```rust
use ruve::embedder::Embedder;
let embedder = Embedder::openai();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);
```
**Ollama** (requires `ollama` running locally with `nomic-embed-text` pulled):
```rust
let embedder = Embedder::ollama();
let vector = embedder.embed("The quick brown fox");
db.insert_raw(vector, "The quick brown fox", None);
```
---
## CLI
An interactive REPL for exploring your database directly from the terminal.
```bash
# default features already include embedder + cli
cargo run --bin ruve
# or explicitly
cargo run --bin ruve --features cli
```
```
RuVe v0.1.0 — type help for available commands, quit to exit
ruve> insert The quick brown fox jumps over the lazy dog
inserted
ruve> search text quick fox 3
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> search vec The quick brown fox 3
query vector dim: 3072
01980... — Some("The quick brown fox jumps over the lazy dog")
ruve> delete 01980...
deleted
ruve> wipe
wiped
```
### Commands
| `insert <text>` | Embed the text and insert a record |
| `insert raw [1.0, 2.0, ...] <text>` | Insert with a pre-computed vector |
| `search vec <query> <k>` | Embed the query and run HNSW vector search |
| `search text <query> <k>` | BM25 full-text search |
| `delete <id>` | Delete a record by UUID |
| `wipe` | Delete all records and indices |
| `load <filename>` | Batch-embed and index every line from `books/<filename>` |
| `list` | List all stored records |
| `help` | Show this help text |
| `quit` / `exit` | Exit the REPL |
---
## Benchmark
Measure insert throughput, vector search latency, text search latency, and Recall@k against brute-force ground truth.
```bash
# run the two smallest scenarios by default
cargo run --bin benchmark
# pick specific scenarios
cargo run --bin benchmark -- xs small medium large highdim
```
| `xxs` | 200 | 128 |
| `xs` | 1 K | 128 |
| `small` | 10 K | 128 |
| `medium` | 50 K | 128 |
| `large` | 100 K | 128 |
| `highdim` | 10 K | 768 |
### Results — `small` (10 K × 128d, k=10)
| Insert | 54 ops/s | — | — | — |
| HNSW vector search | 125 qps | 7.89 ms | 9.27 ms | 9.73 ms |
| Brute-force vector search | 13 qps | 79.20 ms | 83.07 ms | 85.80 ms |
| BM25 text search | 1 011 qps | 1.05 ms | 1.28 ms | 2.03 ms |
---
## Visualizer
An interactive 3-D viewer for the HNSW graph. Click any node to inspect it.

```bash
# populate a small graph with the benchmark, then open the viewer
cargo run --release --bin benchmark -- xxs
cargo run --release --bin visualize
```
The viewer opens as a self-contained HTML file in your browser. You can also pass a specific scenario or point it at any CLI database directory:
```bash
# different benchmark scenario
cargo run --release --bin visualize -- xs
# a database you built through the CLI (stored in ./data)
cargo run --release --bin visualize -- ./data
```
## Running tests
```bash
cargo test
```
## License
MIT