Skip to main content

Crate quiver_embed

Crate quiver_embed 

Source
Expand description

The embeddable, in-process Quiver database handle.

Database composes the storage engine (quiver_core::Store) with a per-collection vector index and payload filtering (quiver_query::Filter) into one handle. It exposes the same logical operations the server speaks (docs/api/wire-protocol.md), so library mode and server mode exercise identical engine semantics — the server is a thin transport/policy shell.

§Index lifecycle

The store is the source of truth. Each collection chooses its index via the descriptor’s IndexSpec (default in-memory HNSW); the index is built from the store on open. HNSW applies new-id inserts incrementally; once an IVF index is built it applies inserts, in-place updates, and deletes incrementally with LIRE rebalancing (ADR-0023). The Vamana / disk graph family is maintained the FreshDiskANN way (ADR-0033): the batch-built graph is a read-only base, recent inserts land in an in-memory delta graph, and deletes are tombstoned, so writes are size-independent; when the pending work grows past a fixed fraction of the base the next access consolidates by rebuilding from the store. All indexes stay derived (rebuilt from the store on open), so the crash gate never sees an index write.

A search may carry a quiver_query::Filter over the payload. The planner decomposes it into the predicates the collection’s secondary indexes can answer; when those narrow the query to a small candidate set it scans that set exactly (perfect recall, no filtered-ANN cliff), and otherwise it over-fetches from the ANN index and post-filters. Both arms re-check the full filter, so results are exact regardless of which path runs.

§Concurrency (ADR-0057 / ADR-0062)

Single-writer. Writes take &mut self. Reads come in two flavors: the &mut self convenience methods (search, hybrid_search, search_multi_vector) rebuild a stale index in place and so give embedded, single-threaded callers read-your-writes; the &self *_snapshot methods read the current immutable snapshot and run concurrently, serving the prior snapshot when a write deferred a rebuild (snapshot-isolated, slightly stale). A server therefore serves concurrent reads behind a reader–writer lock, and rebuilds off the exclusive lock (ADR-0062): it captures the rebuild inputs under the shared lock (Database::snapshot_rebuild_inputs), builds the new index with no lock held (RebuildInputs::build), and installs it under a brief write lock (Database::commit_rebuild) — so a rebuild never stalls concurrent readers.

Structs§

CollectionId
A collection identifier, assigned monotonically by the catalog and stable for the life of the collection.
CollectionSnapshot
An immutable, lock-free-readable view of a single-vector collection (ADR-0064): the base index as of the last rebuild, the base id map, and the overlay of writes since. Obtained via Database::collection_snapshot and read with CollectionSnapshot::search; a read is snapshot-isolated — it sees one consistent (base, overlay) pair, and a write that lands mid-read is simply the next snapshot.
Database
An in-process Quiver database over one data directory.
Descriptor
The immutable schema of a collection, fixed at creation.
DocumentMatch
A multi-vector (late-interaction / ColBERT) document result: a document id, its MaxSim relevance, the payload, and — if requested — the document’s token vectors (ADR-0028).
FilterableField
A payload field declared filterable at collection creation: its dot-path and type. Declared fields are extracted into the per-segment secondary index at flush time (ADR-0022), enabling pre-filtered (hybrid) search.
IndexSpec
Which index a collection uses and how its vectors are compressed (ADR-0007, ADR-0008). Defaults to in-memory HNSW with no quantization (exact search).
Match
A single search or fetch result.
RebuildInputs
A captured, owned snapshot of everything an off-lock rebuild needs (ADR-0062): the scanned rows, the collection’s descriptor, and the write generation at capture time. Produced under the shared read lock by Database::snapshot_rebuild_inputs; RebuildInputs::build then constructs the new index with no lock held.
RebuiltIndex
A new index built off-lock from a RebuildInputs, ready for Database::commit_rebuild to install under the brief write lock (ADR-0062).
SearchParams
Parameters for a Database::search.
SingleCodecKeyRing
A KeyRing that seals everything — catalog and every collection — with one shared codec.
SnapshotInfo
What a Database::snapshot captured (ADR-0050): the catalog generation and the number of files / bytes copied.
SparseInvertedIndex
An in-memory inverted index over sparse vectors (ADR-0045).
SparseVector
A sparse vector: parallel indices and values. Indices are dimension ids into a (possibly very large) sparse vocabulary; values are their weights.
WalEntry
A WAL record: a monotonic LSN paired with the operation it commits.

Enums§

DistanceMetric
The distance / similarity function a collection is searched with.
Dtype
The element type of stored vectors. Phase 1 ships f32; lower-precision and quantized dtypes arrive with the memory-frugality work in Phase 2.
Error
Errors returned by the embeddable database.
FieldType
The type of a filterable payload field, which fixes how its values are keyed in the secondary index (.sec) — and therefore which predicates it answers.
Filter
A predicate over a point’s JSON payload.
IndexKind
The index structure a collection is served by (ADR-0007). The default is the in-memory HNSW graph; the others are the Phase 2 memory-frugal options.
VectorEncryption
How a collection’s vectors are encrypted (ADR-0031, ADR-0032). Encryption is always client-side — the server never holds the key. Defaults to VectorEncryption::None. The variants sit on Quiver’s encrypted-search spectrum, from fastest to most confidential:
WalOp
A single logical mutation recorded in the WAL.

Constants§

BM25_B
The conventional BM25 length-normalization parameter.
BM25_K1
The conventional BM25 term-frequency saturation parameter (Robertson et al.).
DEFAULT_RRF_K0
The conventional RRF rank-bias constant (Cormack et al., 2009).
SPARSE_KEY
The reserved payload key carrying a point’s sparse vector (ADR-0043).
TEXT_KEY
The reserved payload key carrying a point’s full-text field (ADR-0046). When a point has no explicit __quiver_sparse__ vector but carries a string under this key, the engine tokenizes it into a term-frequency sparse vector at ingest, so the point is searchable by BM25 over text alone.

Traits§

KeyRing
Supplies the page codecs the storage engine seals data with, and manages the per-collection key lifecycle that crypto-shredding relies on.
PageCodec
Transforms Quiver’s durable bytes — fixed-size pages and variable-length records — to and from their on-disk representation.

Functions§

query_term_ids
Tokenize text into the de-duplicated query term ids BM25 scores against (a repeated query term counts once). The query side of the BM25 path (ADR-0046).
restore_snapshot
Restore a snapshot directory src (produced by Database::snapshot) into a fresh dest directory, leaving it ready for the caller to open with the same keyring/codec the snapshot was written under (ADR-0050).
rrf_fuse
Fuse several ranked id lists by Reciprocal Rank Fusion and return the top top_k ids with their fused scores, highest first.
text_to_sparse
Tokenize text into a term-frequency SparseVector: dimension ids are token ids (term_id) and values are within-text term counts. The ingest side of the BM25 path (ADR-0046).

Type Aliases§

CommitObserver
A synchronous hook invoked with each committed WalEntry, in commit order. Leader-follower replication (ADR-0030) installs one to publish each op to its replication stream. A plain Fn keeps the engine runtime-agnostic — no async dependency leaks into quiver-core.
Result
Result alias for database operations.
SnapshotCell
Per-collection lock-free serving snapshot pointer: the single writer stores a new CollectionSnapshot; readers load one without a lock. (ArcSwap<T> stores an Arc<T> internally, so this is one Arc per load.)