# `issundb-vector` Agent Guide
This file covers crate-specific guidance for contributors working inside `crates/issundb-vector`.
Read the root `AGENTS.md` first; the rules there apply everywhere and are not repeated here.
## `VectorIndex` Lifecycle
`VectorIndex` starts in the `Inner::Empty` state and is lazily initialized on the first call to `upsert`:
1. **Empty**: no usearch index exists yet; the dimension count is unknown.
2. **Ready**: the index is live with a fixed dimension count; `upsert` and `search` both operate against it.
State transitions are guarded by an internal `parking_lot::Mutex<Inner>`.
Initialization happens inside the mutex: create an `IndexOptions`, call `Index::new`, call `index.reserve(64)`, then insert the first vector.
Once `Ready`, the dimension count is immutable for the lifetime of the index.
## Dimension Contract
All vectors added to a given `VectorIndex` must have the same number of dimensions.
This is enforced at the API boundary:
- In `upsert`, if `v.len() != dims` for a `Ready` index, return `Err(Error::Vector(...))` immediately. Never silently truncate or pad the vector.
- In `search`, if `q.len() != dims`, return `Err(Error::Vector(...))`.
- An empty vector (`v.len() == 0`) is rejected by `upsert` before the state check.
Do not add any path that changes `dims` after initialization.
## `VectorIndexOptions` Fields
`VectorIndexOptions` (in `src/index.rs`) controls index construction:
- `metric: VectorMetric` (default: `Cosine`): the distance function used for all ANN queries on this index. Options:
- `Cosine`: angular similarity; suitable for normalized text embeddings.
- `L2`: Euclidean distance; suitable for spatial or non-normalized vectors.
- `Dot`: inner product; use when vectors are already normalized to unit length and maximum dot product is the goal.
- `quantization: VectorQuantization` (default: `F32`): scalar precision for stored vectors. Trade-offs:
- `Float32`: full precision, no recall loss.
- `Float16`: 2x memory reduction, minor recall loss (typically < 1 %).
- `Int8`: 4x memory reduction, moderate recall loss; suitable for large corpora where approximate results are acceptable.
The metric and quantization are fixed at index construction time and cannot be changed without rebuilding the index from scratch.
## `usearch` API Notes
The usearch `Index` does not auto-grow its internal capacity. Follow these rules:
- Call `index.reserve(n)` before calling `index.add`. The initial reservation on first `upsert` is `64`.
- Before each subsequent `upsert` in the `Ready` branch, check `index.size() >= index.capacity()`. If true, call
`index.reserve((index.capacity() * 2).max(64))` before adding.
- `index.add(node_id, vector)` does not replace an existing entry; call `index.remove(node_id)` first if the node already exists in the index (
`index.contains(node_id)`).
- usearch `search` returns at most `min(k, index.size())` results. Clamp `k` to `index.size()` before searching to avoid requesting more results than
the index holds.
## The Cold-Start Pattern in `get_or_init_cache`
`get_or_init_cache` builds the in-memory HNSW index from LMDB on first use:
1. Call `graph.get_extension::<VectorIndexCache>()` under no lock. If present, return it immediately.
2. Call `graph.vector_bytes()` **before** acquiring the `extensions` lock. This avoids holding both the LMDB read lock and the extensions lock
simultaneously.
3. Build a fresh `VectorIndex` and populate it from the loaded bytes.
4. Acquire the `extensions` lock and do a second existence check (double-check idiom) before inserting, to prevent overwriting an index that was
concurrently initialized by another thread.
Never call `graph.vector_bytes()` or any `Graph` method while holding the `extensions` mutex.
## `VectorSearchOptions.label` Filter
When `opts.label` is `Some(label)`:
1. Over-fetch from the index: request `(opts.k * 4).max(opts.k + 64)` candidates.
2. For each candidate, call `graph.get_node(hit.node)` and `graph.label_name(record.label)` to verify the label.
3. Collect the first `opts.k` survivors.
This over-fetch factor compensates for label distribution skew. Fewer than `opts.k` results may be returned when the index contains fewer matching
nodes. Do not error in this case; return whatever survivors were found.
## Testing Rules
Every test that touches vector behavior must cover all three of the following scenarios, each in its own test function:
1. **Persist and reload**: `upsert → search` in one `Graph` instance; then reopen the same path and `search` again. The same nearest neighbor must
appear after reload.
2. **Dimension mismatch**: after the first `upsert` fixes dimensions, a second `upsert` with a different dimension count must return
`Err(Error::Vector(...))`.
3. **Empty index**: `vector_search` on a graph with no vectors must return an empty `Vec`, not an error.
Each test must open its own `TempDir` and must not share a `Graph` instance with other tests.