iqdb 1.0.0

Embedded vector database for Rust. Exact and approximate (HNSW/IVF) similarity search with durable storage, over the iqdb crate family.
Documentation
# iqdb v0.3.0 — Top-`k` Search

**The library is now a vector database.** v0.3.0 wires exact top-`k` similarity search through the v0.2.0 primitives: `Iqdb::search`, `Iqdb::search_with` (predicate-filtered), `Iqdb::search_batch`, and `Iqdb::search_batch_with` all return ordered `SearchResult { id, score, payload }` lists from a bounded-heap brute-force kernel. Filter predicates are monomorphised into the scan loop — no per-record dynamic dispatch. A `proptest` harness pins eight ranking and distance-metric invariants. The full API surface is now documented under `docs/API.md`.

## What is iqdb?

An embedded vector database for Rust — a single-process, in-application similarity-search engine designed for high-dimensional workloads where every microsecond on the query path matters. It targets the same operational shape as `sqlite` or `redb`: no daemon, no network hop, no separate runtime. Open a handle, write vectors, query nearest neighbours — all from inside your binary. The engine is built against a lock-free hot path, an allocation-free steady state, and a cache-aware on-disk layout, with pluggable indices and pluggable storage so workloads can trade recall for latency without rewriting the surrounding application.

## What's new in 0.3.0

### `Iqdb::search` — exact top-`k` flat scan

The simplest entry point. Pick a distance metric, choose `k`, hand over a probe vector, get back a ranked list:

```rust
use iqdb::{DistanceMetric, Iqdb, Record, RecordId, Vector};

let db = Iqdb::open_in_memory();
db.upsert(Record::new(RecordId::new(1), Vector::new(vec![1.0, 0.0, 0.0])?))?;
db.upsert(Record::new(RecordId::new(2), Vector::new(vec![0.99, 0.10, 0.0])?))?;
db.upsert(Record::new(RecordId::new(3), Vector::new(vec![0.0, 1.0, 0.0])?))?;

let probe = Vector::new(vec![1.0, 0.0, 0.0])?;
let hits = db.search(&probe, 2, DistanceMetric::Cosine)?;
assert_eq!(hits.len(), 2);
assert_eq!(hits[0].id, RecordId::new(1)); // perfect match
# Ok::<(), iqdb::Error>(())
```

Results are sorted ascending by `score` under the **smaller-is-closer** convention. `Dot` returns `−(a · b)` so the same ordering rule covers L2, Cosine, and Dot — search engines built on top do not need to special-case maximum-inner-product semantics.

The kernel is `O(N · D + N · log k)` for `N` records of dimensionality `D` and a top-`k` heap of size `k`. The heap admits at most `k + 1` allocations during the scan; payloads are cloned only for the `k` survivors, not for every transient candidate. Approximate indices (IVF, HNSW) land in v0.5.0 and will sit alongside the flat kernel rather than replacing it — exact search remains the ground truth against which approximate recall is measured.

### `Iqdb::search_with` — generic filter, no dynamic dispatch

```rust
use iqdb::{DistanceMetric, Iqdb, Payload, PayloadValue, Record, RecordId, Vector};

let db = Iqdb::open_in_memory();
let mut doc = Payload::new();
doc.insert("kind", "doc");
db.upsert(Record::with_payload(
    RecordId::new(1),
    Vector::new(vec![1.0, 0.0])?,
    doc,
))?;

let probe = Vector::new(vec![1.0, 0.0])?;
let docs_only = db.search_with(&probe, 5, DistanceMetric::Cosine, |rec| {
    rec.payload()
        .and_then(|p| p.get("kind"))
        .and_then(PayloadValue::as_text)
        == Some("doc")
})?;
# Ok::<(), iqdb::Error>(())
```

The filter signature is `F: Fn(&Record) -> bool` — monomorphic. Each distinct closure produces a distinct `search_with` instantiation, so the predicate inlines into the inner loop and pays no dispatch cost per record. With `N = 10 000` records, a payload filter that prunes ~50% adds roughly the cost of the filter call itself; the bench numbers under `cargo bench --bench vector_ops -- search` show the gap.

The filter runs **after** distance computation but **before** the heap admit decision — so a record that fails the filter is never considered for the result set, regardless of how good its score is. The filter executes while the store's read lock is held; calling back into the same `Iqdb` handle from inside the filter is a re-entrant-lock hazard and is documented as such on every entry point.

### `Iqdb::search_batch` / `search_batch_with`

Run multiple probes through the store in one API call. Input order is preserved — `output[i]` is the top-`k` for `queries[i]`:

```rust
use iqdb::{DistanceMetric, Iqdb, Vector};

let db = Iqdb::open_in_memory();
# db.upsert(iqdb::Record::new(
#     iqdb::RecordId::new(1),
#     Vector::new(vec![1.0, 0.0, 0.0])?,
# ))?;
let probes = vec![
    Vector::new(vec![1.0, 0.0, 0.0])?,
    Vector::new(vec![0.0, 1.0, 0.0])?,
    Vector::new(vec![-1.0, 0.0, 0.0])?,
];
let batches = db.search_batch(&probes, 5, DistanceMetric::Cosine)?;
assert_eq!(batches.len(), 3);
# Ok::<(), iqdb::Error>(())
```

The v0.3.0 implementation is sequential — each batch element acquires the store's read lock independently. Parallel batch execution is reserved for v0.5.0, where the per-shard scan that batch parallelism rides on will already exist for the IVF index.

### `SearchResult` — what every hit carries

```rust
pub struct SearchResult {
    pub id: RecordId,
    pub score: f32,
    pub payload: Option<Payload>,
}
```

The payload is cloned at search time so callers do not need a follow-up `get` for metadata. Records without a payload yield `payload: None`. Under the `serde` feature, `SearchResult` derives `Serialize` / `Deserialize` — so search results can be marshalled into JSON, snapshotted, or sent over a wire without an adapter.

### NaN-aware total ordering with id tie-break

`f32` is only `PartialOrd`. The top-`k` heap and the final result sort need a total order or the BinaryHeap invariant is undefined.

Two complications:

- **Tied scores.** Two records at the same distance must produce a deterministic id ordering — otherwise test assertions and reproducible benchmarks are at the mercy of `HashMap` iteration order. The total order ties-break on `id` ascending.
- **NaN scores.** `Cosine` distance against a zero vector produces `NaN`. `Vector::new` rejects non-finite components but accepts zero vectors — they are valid sparse embeddings. The total order treats `NaN` as worst, so they sort to the tail of the result list rather than corrupting the comparison chain. The integration test `cosine_against_zero_vector_yields_nan_at_tail` pins this behaviour.

### `proptest` harness — `tests/properties.rs`

Eight properties pinned over the search and distance surfaces:

- **`identity_yields_minimal_distance`** — `d(a, a) ≈ 0` under L2 and Cosine; `Dot` returns `−‖a‖²` (non-positive).
- **`distance_is_symmetric`** — `d(a, b) == d(b, a)` for every metric, with NaN agreement on the zero-vector edge cases.
- **`l2_distance_is_non_negative`** — L2 is finite and non-negative for every finite input pair.
- **`cosine_distance_is_in_zero_two_range`** — `[0, 2]` (with a small rounding slack) whenever both norms are non-zero.
- **`search_length_is_bounded`** — `|hits| ≤ min(k, store.len())` for arbitrary store + k combinations.
- **`search_results_are_sorted_ascending`** — pairs `(prev, next)` in the result list satisfy `prev ≤ next` under the NaN-aware order.
- **`perfect_match_is_always_in_top_k`** — a record identical to the probe always appears in the top-`k` when `k ≥ 1`.
- **`unfiltered_matches_always_true_filter`** — `search(...)` and `search_with(..., |_| true)` produce identical id sequences.

`proptest` is pinned to the 1.x line and entered with `default-features = false, features = ["std"]` to keep the dev-dependency footprint small. The full suite runs in under a second at the default 256-case budget.

### Full API reference — `docs/API.md`

The complete public surface is now documented under [`docs/API.md`](../API.md). Every public type, method, error variant, and feature flag is recorded with parameter descriptions and at least one runnable example, following the metrics-lib API.md format. The README links to it as the canonical reference; the rustdoc on [docs.rs/iqdb](https://docs.rs/iqdb) carries the same information in browseable form.

### New benchmark group — `search`

Three new Criterion benches at 1 000 and 10 000 records, dim 128:

- **`search/flat_k10_dim128`** — unfiltered top-10.
- **`search/flat_k10_dim128_filter_half`** — payload filter that prunes ~50% of the corpus before heap admission.
- **`search/batch4_k10_dim128`** — four probes through `search_batch`.

`cargo bench --bench vector_ops -- search` produces the side-by-side numbers. The 1k vs 10k pair shows how the kernel scales linearly with `N` (the dominant cost is distance computation, not heap maintenance).

## Breaking changes

**None.** v0.3.0 is purely additive on top of v0.2.0:

- The four new `Iqdb::search*` methods are additions.
- `SearchResult` is a new public type.
- No existing v0.2.0 surface (types, methods, error variants) changes shape or semantics.

Callers that compiled against v0.2.0 should compile against v0.3.0 without source changes.

## Verification

Run on Windows x86_64 and on WSL2 Ubuntu (Rust stable 1.95.0). The same commands run in the configured CI matrix on Linux, macOS, and Windows:

```bash
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo clippy --all-targets --all-features -- -D warnings
cargo test
cargo test --all-features
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
cargo deny check
cargo audit --deny warnings
```

All green on both hosts. Test counts at this tag:

- **Default features:** 77 unit + 1 + 8 + 14 + 8 integration (35 in `tests/`) + 37 doctests.
- **`--all-features`:** 77 unit + 1 + 11 + 14 + 8 integration (38 in `tests/`; the +3 are `serde` JSON round-trips for `Vector`, `Record`, `DistanceMetric`) + 37 doctests.

`cargo deny check` reports `advisories ok, bans ok, licenses ok, sources ok`. `cargo audit` scans 60 transitive dependencies (default features) with zero advisories.

## What's next

- **v0.4.0 — Durable storage.** File-backed storage substrate. Write-ahead log, atomic-replace snapshots, crash recovery. `Iqdb::open(path)` becomes load-bearing and the `Error::NotImplemented` branches on `open` / `flush` disappear. Platform-specific durability paths (`fsync` / `F_FULLFSYNC` / `FlushFileBuffers`) wired through.
- **v0.5.0 — Approximate indices.** IVF and HNSW behind the same trait the flat index implements. Build-time index selection via a builder. Approximate recall measured against the v0.3.0 flat-search ground truth.
- **v0.6.0 — Async surface.** Tokio-driven async mirror of the public API. Cancellation-safe.

## Installation

```toml
[dependencies]
iqdb = "0.3"

# Enable the optional `serde` feature
iqdb = { version = "0.3", features = ["serde"] }
```

MSRV: Rust 1.87.

## Documentation

- [README](https://github.com/jamesgober/iqdb/blob/main/README.md)
- [API Reference](https://github.com/jamesgober/iqdb/blob/main/docs/API.md)
- [Standards (REPS)](https://github.com/jamesgober/iqdb/blob/main/REPS.md)
- [CHANGELOG](https://github.com/jamesgober/iqdb/blob/main/CHANGELOG.md)
- [docs.rs/iqdb](https://docs.rs/iqdb)

---

**Full diff:** [`v0.2.0...v0.3.0`](https://github.com/jamesgober/iqdb/compare/v0.2.0...v0.3.0).
**Changelog:** [`CHANGELOG.md`](https://github.com/jamesgober/iqdb/blob/main/CHANGELOG.md#030--2026-05-30).