iqdb 1.0.0

Embedded vector database for Rust. Exact and approximate (HNSW/IVF) similarity search with durable storage, over the iqdb crate family.
Documentation
# iqdb v0.2.0 — Vector Primitives

**The first load-bearing surface.** v0.2.0 builds on the v0.1.0 scaffolding with the typed primitives every embedded vector database needs: validated `f32` embeddings, a three-variant distance metric, typed payload metadata, a `(RecordId, Vector, Option<Payload>)` aggregate, and a thread-safe in-memory store wired through `Iqdb::upsert` / `get` / `delete`. The optional `serde` feature derives `Serialize` / `Deserialize` on every public data type without pulling a runtime cost when disabled. Search (`v0.3.0`) and the durable backend (`v0.4.0`) are next.

## What is iqdb?

An embedded vector database for Rust — a single-process, in-application similarity-search engine designed for high-dimensional workloads where every microsecond on the query path matters. It targets the same operational shape as `sqlite` or `redb`: no daemon, no network hop, no separate runtime. Open a handle, write vectors, query nearest neighbours — all from inside your binary. The engine is being built against a lock-free hot path, an allocation-free steady state, and a cache-aware on-disk layout, with pluggable indices and pluggable storage so workloads can trade recall for latency without rewriting the surrounding application.

## What's new in 0.2.0

### `Vector` — validated `f32` embedding

`Vector` is the unit of value stored by `iqdb`. The storage layout is a contiguous, owned `Box<[f32]>` — equivalent to a `Vec<f32>` without the spare-capacity overhead — so distance kernels can iterate over dense memory with no indirection.

Validation happens at construction: empty inputs are rejected (`Error::InvalidVector { reason: "vector is empty" }`), and any non-finite component (`NaN`, `+∞`, `−∞`) is rejected (`Error::InvalidVector { reason: "vector contains a non-finite value" }`). Downstream code can therefore treat every constructed `Vector` as known-good — no internal path needs to re-check finiteness or emptiness.

```rust
use iqdb::Vector;

let v = Vector::new(vec![0.1, 0.2, 0.3])?;
assert_eq!(v.dim(), 3);
assert_eq!(v.as_slice(), &[0.1, 0.2, 0.3]);

// `from_slice` copies; `new` consumes a `Vec` without copy.
let copied = Vector::from_slice(&[1.0, 2.0])?;
assert_eq!(copied.dim(), 2);
# Ok::<(), iqdb::Error>(())
```

`TryFrom<Vec<f32>>` and `TryFrom<&[f32]>` are provided for ergonomic `try_into()` call sites. `AsRef<[f32]>` lets `Vector` flow through APIs that accept any slice-like.

### `DistanceMetric` — `L2` / `Cosine` / `Dot`

Three distance metrics under a single smaller-is-closer convention:

| Variant   | Returns                                       | Range            |
|-----------|-----------------------------------------------|------------------|
| `L2`      | Euclidean distance `‖a − b‖₂`                 | `[0, +∞)`        |
| `Cosine`  | `1 − cos(θ)` where `θ` is the angle           | `[0, 2]`         |
| `Dot`     | `−(a · b)` — negative dot product             | `(−∞, +∞)`       |

`Dot` returns the **negative** dot product so that search engines built on top of `DistanceMetric` can use the same ordering rule across all three metrics — no special-casing for inner-product similarity.

```rust
use iqdb::{DistanceMetric, Vector};

let a = Vector::new(vec![1.0, 0.0])?;
let b = Vector::new(vec![0.0, 1.0])?;

// Orthogonal unit vectors: L2 distance √2, cosine distance 1, dot distance 0.
let l2 = DistanceMetric::L2.distance(&a, &b)?;
let cos = DistanceMetric::Cosine.distance(&a, &b)?;
let dot = DistanceMetric::Dot.distance(&a, &b)?;

assert!((l2 - std::f32::consts::SQRT_2).abs() < 1e-6);
assert!((cos - 1.0).abs() < 1e-6);
assert!(dot.abs() < 1e-6);
# Ok::<(), iqdb::Error>(())
```

Dimensional homogeneity is enforced — `metric.distance(a, b)` where `a.dim() != b.dim()` returns `Error::DimensionMismatch { left, right }` with the observed dimensions for an actionable error.

### `Payload` / `PayloadValue` — typed K/V metadata

A `Payload` is an ordered, typed key-value bag that travels alongside each vector. The shape is intentionally JSON-like but **typed**: every value carries its native Rust representation rather than going through `Cow<str>` or an untagged dynamic encoding. This lets the v0.3.0 filter-by-payload predicates compare values without a parse step on the hot path.

`PayloadValue` covers `Null` / `Bool` / `Int(i64)` / `Float(f64)` / `Text(String)` / `Bytes(Vec<u8>)` / `Array(Vec<PayloadValue>)` / `Object(BTreeMap<String, PayloadValue>)`. The enum is `#[non_exhaustive]` so new variants can land additively. `From<T>` conversions cover every scalar primitive (`bool`, `i32`, `i64`, `f32`, `f64`, `String`, `&str`, `Vec<u8>`, `Vec<PayloadValue>`, `BTreeMap<String, PayloadValue>`); ergonomic `is_*` / `as_*` predicates avoid match clutter at the call site.

```rust
use iqdb::{Payload, PayloadValue};

let mut meta = Payload::new();
meta.insert("source", "wikipedia");
meta.insert("year", 2026_i64);
meta.insert("verified", true);

assert_eq!(meta.get("source").and_then(PayloadValue::as_text), Some("wikipedia"));
assert_eq!(meta.get("year").and_then(PayloadValue::as_int), Some(2026));
assert_eq!(meta.get("verified").and_then(PayloadValue::as_bool), Some(true));
```

Storage is a `BTreeMap<String, PayloadValue>` rather than `HashMap` — the deterministic iteration order means payload hashes, `serde` round-trips, and test assertions are stable across runs and across machines.

### `RecordId` / `Record` — the read/write unit

`RecordId` is a transparent newtype around `u64`. Cheap to copy (8 bytes), cheap to hash, and stable across the wire when the optional `serde` feature is on. `From<u64>` / `Into<u64>` bidirectional conversions let callers pull ids from any source — hashed strings, autoincrement counters, snowflake ids — without an adapter layer.

`Record` aggregates `(RecordId, Vector, Option<Payload>)`. Two constructors (`Record::new` for vector-only, `Record::with_payload` for vector + metadata) avoid the `Option`-typed argument pattern that degrades readability at call sites. `into_parts` decomposes without a clone for callers that need owned access downstream.

```rust
use iqdb::{Payload, Record, RecordId, Vector};

let mut meta = Payload::new();
meta.insert("kind", "doc");

let r = Record::with_payload(
    RecordId::new(7),
    Vector::new(vec![0.1, 0.2, 0.3])?,
    meta,
);

assert_eq!(r.id().get(), 7);
assert!(r.payload().is_some());
# Ok::<(), iqdb::Error>(())
```

### In-memory store — `Iqdb::upsert` / `get` / `delete`

The handle is now load-bearing. Internally it owns a crate-private `MemoryStore` backed by a `RwLock<HashMap<RecordId, Record>>`. Writers take the write lock; readers take the read lock and clone the matched record out, releasing the lock before handing the clone back. The clone-on-read pattern avoids the common deadlock vector of carrying a `RwLockReadGuard<'_, …>` across `?` boundaries into other locked structures.

```rust
use iqdb::{Iqdb, Record, RecordId, Vector};

let db = Iqdb::open_in_memory();

db.upsert(Record::new(
    RecordId::new(1),
    Vector::new(vec![0.1, 0.2, 0.3])?,
))?;

let hit = db.get(RecordId::new(1))?.expect("record present");
assert_eq!(hit.vector().as_slice(), &[0.1, 0.2, 0.3]);

let removed = db.delete(RecordId::new(1))?;
assert!(removed);
assert!(db.is_empty());
# Ok::<(), iqdb::Error>(())
```

Upsert semantics: replacing an existing id keeps the length unchanged. Delete returns whether the id was present — so idempotent retries don't need a prior `get`. Both lookups (`get`, `delete`) return `Ok(None)` / `Ok(false)` for absent ids — no separate "not found" error variant pollutes the error type.

`Iqdb` is `Send + Sync`; sharing across threads is a matter of `Arc<Iqdb>`. The crate-internal `MemoryStore` recovers from a poisoned lock (writer panic) so a panic in one operation does not propagate to unrelated readers — per the REPS *Failure Modes & Degradation* rule.

`Iqdb::open(path)` and `Iqdb::flush` still return `Error::NotImplemented` — they light up in `v0.4.0` with the durable storage substrate. Wiring them in advance and branching on the variant means call sites can adopt the final shape today and get real behaviour automatically when `v0.4.0` ships.

### Optional `serde` feature

Enable `feature = "serde"` to derive `Serialize` / `Deserialize` on every public data type. `Vector` and `RecordId` use `#[serde(transparent)]` so they round-trip as their underlying representation (a JSON array of floats, a JSON integer). `PayloadValue` is `#[serde(untagged)]` so JSON payloads look natural rather than carrying an enum-discriminant key. `Payload` is transparent over its `BTreeMap<String, PayloadValue>`.

```rust,ignore
use iqdb::{Record, RecordId, Vector};

// Compile-time: features = ["serde"], dev-dep: serde_json = "1".
let record = Record::new(RecordId::new(1), Vector::new(vec![0.1, 0.2, 0.3])?);
let json = serde_json::to_string(&record)?;
let decoded: Record = serde_json::from_str(&json)?;
assert_eq!(decoded.vector().as_slice(), record.vector().as_slice());
# Ok::<(), Box<dyn std::error::Error>>(())
```

The default build pulls **zero** runtime dependencies. Enabling `serde` pulls only the `serde` crate itself; `serde_json` (used in the integration test round-trips) stays a dev-dependency.

### Error type extensions

Two new variants on `Error`:

- `InvalidVector { reason: &'static str }` — surfaced by `Vector` construction. Static reason string keeps the validation path allocation-free even on the error path.
- `DimensionMismatch { left: usize, right: usize }` — surfaced by `DistanceMetric::distance`. Both observed dimensions are reported so the caller can produce an actionable message (e.g. "ingest pipeline produced 384-dim, store expects 768-dim").

`Error::source()` now exposes the wrapped `std::io::Error` on the `Io` variant, so `anyhow` / `tracing` / report formatters can walk to the root cause.

### Benchmark suite — `benches/vector_ops.rs`

The v0.1.0 no-op `benches/scaffold.rs` is replaced with three Criterion groups that measure real work:

- **`vector_new`** — construction-time validation cost at dim 32 / 128 / 1024. Throughput is reported per element so cost scales linearly with dimensionality.
- **`distance`** — single-shot distance under each `DistanceMetric` variant at dim 128.
- **`store`** — `get_hit_1k_dim128` (lookup against a 1 000-record populated store) and `upsert_fresh_dim128` (single-record write throughput against a fresh store, so the bench measures the write path without the rehash cost of a growing `HashMap`).

Run with `cargo bench --bench vector_ops`. Reports land in `target/criterion/`. ANN-search benches arrive with v0.3.0.

### CI matrix expanded

The build-test matrix grew a `serde` row, so every push runs `cargo build && cargo test` six times: `{ubuntu, macos, windows}` × `{default-features, --features serde}`. The lint and docs jobs each run twice — once at default features, once at `--all-features` — so the gated `serde` impls are also linted and documented.

### Documentation

Crate-level rustdoc in `src/lib.rs` now ships three runnable examples — in-memory CRUD round-trip, payload + distance walkthrough, and the staged-surface error pattern — all of which execute under `cargo test --doc`. Every public type, method, conversion, and error variant carries a `///` doc comment per the REPS Documentation mandate; `#![deny(missing_docs)]` is enforced at the crate root and `--all-features` rustdoc passes with `-D warnings`.

## Breaking changes

**Pre-1.0 API churn.** The crate is on a 0.x line and the documented API stability promise is "expect refactors until v1.0." That said, v0.2.0's surface is purely additive on top of v0.1.0:

- The existing v0.1.0 surface (`Iqdb::open_in_memory`, `Iqdb::open`, `Iqdb::flush`, `Iqdb::close`, `Error::Io`, `Error::InvalidConfig`, `Error::NotImplemented`, `Result<T>`) keeps its v0.1.0 semantics.
- New types (`Vector`, `DistanceMetric`, `Payload`, `PayloadValue`, `RecordId`, `Record`) and new `Iqdb` methods (`upsert`, `get`, `delete`, `len`, `is_empty`) are additions.
- New `Error` variants (`InvalidVector`, `DimensionMismatch`) are additions, allowed by the existing `#[non_exhaustive]` annotation. Callers with a wildcard `_` arm in their `match` (which is the documented pattern) need no change.

No code that compiled against v0.1.0 should stop compiling against v0.2.0.

## Verification

Run on Windows x86_64 and on WSL2 Ubuntu (Rust stable 1.95.0). The same commands run in the configured CI matrix on Linux, macOS, and Windows:

```bash
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo clippy --all-targets --all-features -- -D warnings
cargo test
cargo test --all-features
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
cargo deny check
cargo audit --deny warnings
```

All green on both hosts. Test counts at this tag:

- **Default features:** 67 unit + 1 + 8 integration + 34 doctests.
- **`--all-features`:** 67 unit + 1 + 11 integration (the +3 are the `serde` round-trips for `Vector`, `Record`, `DistanceMetric`) + 34 doctests.

`cargo deny check` reports `advisories ok, bans ok, licenses ok, sources ok`. `cargo audit` scans 48 transitive dependencies (default features) or 56 (with `serde` enabled) with zero advisories on either path.

## What's next

- **v0.3.0 — Search.** `search(query, k)` over a flat (exact) index. Batch search. Filter-by-payload predicates. Result ranking with score + id. Property-based testing via `proptest` for distance and ranking invariants. `docs/API.md` populated with the search-surface reference.
- **v0.4.0 — Durable storage.** File-backed storage substrate. Write-ahead log, atomic-replace snapshots, crash recovery. `Iqdb::open(path)` becomes load-bearing and the `Error::NotImplemented` branches on `open` / `flush` disappear.
- **v0.5.0 — Approximate indices.** IVF and HNSW behind the same trait the flat index implements. Build-time index selection via a builder.

## Installation

```toml
[dependencies]
iqdb = "0.2"

# Enable the optional `serde` feature
iqdb = { version = "0.2", features = ["serde"] }
```

MSRV: Rust 1.87.

## Documentation

- [README](https://github.com/jamesgober/iqdb/blob/main/README.md)
- [Standards (REPS)](https://github.com/jamesgober/iqdb/blob/main/REPS.md)
- [CHANGELOG](https://github.com/jamesgober/iqdb/blob/main/CHANGELOG.md)
- [docs.rs/iqdb](https://docs.rs/iqdb)

---

**Full diff:** [`v0.1.0...v0.2.0`](https://github.com/jamesgober/iqdb/compare/v0.1.0...v0.2.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/iqdb/blob/main/CHANGELOG.md#020--2026-05-30).