nidus 0.2.0

A small, pure-Rust embeddable vector store: brute-force cosine search over a single append-only file. No FFI, no C, no SQL; anyhow its only dependency.
Documentation
# nidus

A small, pure-Rust **embeddable vector store**. Brute-force cosine search over a
single append-only directory, with typed metadata filters and many logical
collections sharing one embedding space. No FFI, no C, no SQL, no query engine.

> _nidus_ (Latin, "nest") — a small place where things are kept safe.

## Why it exists

nidus is the local storage leg for semantic-search and indexing tools: chunk some
source → embed each chunk → store the vectors + metadata → ask for nearest
neighbours. The obvious off-the-shelf options fail the **build-and-ship** test, not
the functionality test:

- **DuckDB** (via `libduckdb-sys`) bundles a large C++ source tree and compiles it
  from scratch — multi-minute cold builds, a required C++ toolchain, a bloated
  binary, and FFI that can't run under Miri. A vector workload uses ~1% of it.
- **LanceDB** is "written in Rust" yet still takes ~10 minutes to compile, because
  it drags in Arrow + DataFusion (a full SQL engine) + a columnar format — hundreds
  of crates to do `ORDER BY distance LIMIT k`.

The workload is a *vector store, not a database*. nidus is that store and nothing
more, so it **compiles in seconds** and embeds as a normal Rust dependency.

### The constraints are the product

- **Pure-Rust dependencies only** — never a crate that compiles C or links a native
  library (no `*-sys`, no bundled C/C++).
- **Zero FFI, zero `unsafe`** in our code (`#![forbid(unsafe_code)]`).
- **No C to compile**`cargo build` is just rustc.
- **Fully Miri-checkable** — including the file IO.

## Quick start

```toml
[dependencies]
nidus = "0.1"
```

```rust
use std::collections::BTreeMap;
use nidus::{Nidus, Config, Record, SearchOpts, Scope, Value, Filter, Predicate};

// Open (or create) a store. The directory is always the caller's choice;
// the dimension is pinned for the life of the store.
let mut db = Nidus::open(Config::new("/path/to/store", 4))?;
db.create_collection("code")?;

// Index some records: id + embedding + arbitrary typed metadata.
let mut attrs = BTreeMap::new();
attrs.insert("path".into(), Value::Str("src/auth/login.rs".into()));
db.upsert("code", &[Record { id: "a".into(), vector: vec![1.0, 0.0, 0.0, 0.0], attrs }])?;

// Nearest neighbours (cosine), top-k.
let hits = db.search("code", &[1.0, 0.0, 0.0, 0.0], &SearchOpts { top_k: 5, ..Default::default() })?;
for h in &hits {
    println!("{:.3}  [{}] {}", h.score, h.collection, h.id);
}

// Search the whole store at once, with a metadata filter + score floor.
let opts = SearchOpts {
    top_k: 10,
    filter: Filter(vec![Predicate::Glob("path".into(), "src/auth/*".into())]),
    min_score: Some(0.5),
};
let hits = db.search(Scope::All, &[1.0, 0.0, 0.0, 0.0], &opts)?;
# anyhow::Ok(())
```

See [`examples/demo.rs`](examples/demo.rs) for an end-to-end run (`cargo run
--example demo`).

## What it does

- **Exact brute-force cosine** — 100% recall, fast at the target scale (≤ a few
  million vectors, comfortably in RAM). Vectors are unit-normalized on insert, so a
  score is plain cosine similarity in `[-1, 1]`.
- **Scoped search** — query one collection, a subset, or the **whole store** in one
  call, merged into a single ranking. Sound because every collection shares one
  embedding space (one pinned dimension).
- **Typed metadata + filters** — attach `Str`/`Int`/`Bool`/`List`/`Null` attributes
  and filter with `Eq` / `Glob` / `In` predicates before scoring.
- **Idempotent upserts** by caller-supplied id; `delete`, `delete_where`, per-
  collection metadata.
- **Crash-safe & durable** — an append-only flat-`f32` `data` segment plus a framed,
  CRC-checked op `log` (the commit record). A crash loses at most the in-flight
  batch; a torn tail is recovered on open. Cross-process readers get a consistent,
  lock-free snapshot (`OpenMode::ReadOnly`).
- **Synchronous, runtime-agnostic** — the hot path is CPU-bound, so there's no async
  core to lock you into a runtime. `Arc<RwLock<Nidus>>` gives concurrent searchers +
  one writer; async callers bridge with `spawn_blocking`.

## On-disk layout

A store is a directory:

```
<dir>/
  data    append-only, fixed-stride, row-major f32 matrix (header pins dimension)
  log     append-only framed op stream: [len][bincode(Op)][crc32] — the commit record
  lock    O_EXCL writer-exclusion lock file
```

`open` reads `data` into RAM and replays `log` into an in-RAM index
(`collection → { id → (row, attrs) }`). Search never touches disk.

## Configuration

```rust
use std::time::Duration;
use nidus::{Config, Fsync, OpenMode};

let cfg = Config::new("/path/to/store", 768)
    .fsync(Fsync::PerBatch)          // durability granularity (default)
    .open_mode(OpenMode::ReadWrite)  // ReadOnly = no lock, search-only
    .auto_compact(Some(0.5))         // compact on open above this dead-row ratio
    .lock_ttl(Duration::from_secs(60));
```

The store **location is always the caller's choice** — nidus contributes no path
defaults, env vars, or hidden directories.

## Development

```bash
just test    # all tests
just ci       # fmt-check + clippy (-D warnings) + test
just miri     # undefined-behavior check (nightly; pure-Rust ⇒ runs end to end)
just demo     # the end-to-end example
just deps     # the dependency tree (stays short, all pure Rust)
```

Rust 1.96+ (pinned via `rust-toolchain.toml`), edition 2024.

## Design

The full design — data model, on-disk format, durability/concurrency model, and the
deferred seams (mmap, ANN/HNSW, scalar quantization, a lightweight server) — lives in
[`SPEC.md`](SPEC.md). Each module also carries its own contract in `src/<module>/SPEC.md`.

## License

[MIT](LICENSE)