nidus
A small, pure-Rust embeddable vector store. Brute-force cosine search over a single append-only directory, with typed metadata filters and many logical collections sharing one embedding space. No FFI, no C, no SQL, no query engine.
nidus (Latin, "nest") — a small place where things are kept safe.
Why it exists
nidus is the local storage leg for semantic-search and indexing tools: chunk some source → embed each chunk → store the vectors + metadata → ask for nearest neighbours. The obvious off-the-shelf options fail the build-and-ship test, not the functionality test:
- DuckDB (via
libduckdb-sys) bundles a large C++ source tree and compiles it from scratch — multi-minute cold builds, a required C++ toolchain, a bloated binary, and FFI that can't run under Miri. A vector workload uses ~1% of it. - LanceDB is "written in Rust" yet still takes ~10 minutes to compile, because
it drags in Arrow + DataFusion (a full SQL engine) + a columnar format — hundreds
of crates to do
ORDER BY distance LIMIT k.
The workload is a vector store, not a database. nidus is that store and nothing more, so it compiles in seconds and embeds as a normal Rust dependency.
The constraints are the product
- Pure-Rust dependencies only — never a crate that compiles C or links a native
library (no
*-sys, no bundled C/C++). - Zero FFI, zero
unsafein our code (#![forbid(unsafe_code)]). - No C to compile —
cargo buildis just rustc. - Fully Miri-checkable — including the file IO.
Quick start
[]
= "0.1"
use BTreeMap;
use ;
// Open (or create) a store. The directory is always the caller's choice;
// the dimension is pinned for the life of the store.
let mut db = open?;
db.create_collection?;
// Index some records: id + embedding + arbitrary typed metadata.
let mut attrs = new;
attrs.insert;
db.upsert?;
// Nearest neighbours (cosine), top-k.
let hits = db.search?;
for h in &hits
// Search the whole store at once, with a metadata filter + score floor.
let opts = SearchOpts ;
let hits = db.search?;
# Ok
See examples/demo.rs for an end-to-end run (cargo run --example demo).
What it does
- Exact brute-force cosine — 100% recall, fast at the target scale (≤ a few
million vectors, comfortably in RAM). Vectors are unit-normalized on insert, so a
score is plain cosine similarity in
[-1, 1]. - Scoped search — query one collection, a subset, or the whole store in one call, merged into a single ranking. Sound because every collection shares one embedding space (one pinned dimension).
- Typed metadata + filters — attach
Str/Int/Bool/List/Nullattributes and filter withEq/Glob/Inpredicates before scoring. - Idempotent upserts by caller-supplied id;
delete,delete_where, per- collection metadata. - Crash-safe & durable — an append-only flat-
f32datasegment plus a framed, CRC-checked oplog(the commit record). A crash loses at most the in-flight batch; a torn tail is recovered on open. Cross-process readers get a consistent, lock-free snapshot (OpenMode::ReadOnly). - Synchronous, runtime-agnostic — the hot path is CPU-bound, so there's no async
core to lock you into a runtime.
Arc<RwLock<Nidus>>gives concurrent searchers + one writer; async callers bridge withspawn_blocking.
On-disk layout
A store is a directory:
<dir>/
data append-only, fixed-stride, row-major f32 matrix (header pins dimension)
log append-only framed op stream: [len][bincode(Op)][crc32] — the commit record
lock O_EXCL writer-exclusion lock file
open reads data into RAM and replays log into an in-RAM index
(collection → { id → (row, attrs) }). Search never touches disk.
Configuration
use Duration;
use ;
let cfg = new
.fsync // durability granularity (default)
.open_mode // ReadOnly = no lock, search-only
.auto_compact // compact on open above this dead-row ratio
.lock_ttl;
The store location is always the caller's choice — nidus contributes no path defaults, env vars, or hidden directories.
Development
Rust 1.95+ (pinned via rust-toolchain.toml), edition 2024.
Design
The full design — data model, on-disk format, durability/concurrency model, and the
deferred seams (mmap, ANN/HNSW, scalar quantization, a lightweight server) — lives in
SPEC.md. Each module also carries its own contract in src/<module>/SPEC.md.