Opt-in, provider-agnostic embedding & reranking adapters (ADR-0047/0058).
The Quiver engine is deliberately model-agnostic: it stores and searches
float vectors and knows nothing about embedding models. This crate is the
edge adapter that lets an operator turn "give me text" into a
stored/searched vector without the client running an embedding model — the
single biggest RAG friction. It lives in its own lean crate (no axum/tonic)
so it can be shared by both the network server (quiver-server) and the
in-process MCP server (quiver-mcp) without either pulling the other's
dependency tree (ADR-0058); it is never used by quiver-core or the
quiver-embed engine crate, so library-mode users pay nothing.
Design (ADR-0047)
- Provider-agnostic. [
EmbeddingProvider] / [RerankProvider] are traits; OpenAI-compatible servers (OpenAI, Ollama's/v1endpoint, vLLM, LM Studio, llama.cpp, …) share one HTTP adapter parameterized by base URL + auth, Cohere has its own shape, and a deterministic [FakeEmbedder]/[FakeReranker] backs tests and the acceptance script. No vendor is hard-coded; selection is config. - Opt-in, per collection, default off. Configured in the server config
(
[embedding.<collection>]/[rerank.<collection>]), not the on-disk descriptor — so the engine and the crash gate are untouched. - No secrets on disk. Config stores the name of an environment variable
([
EmbeddingConfig::api_key_env]); the value is resolved at registry-build time and never persisted.
Testing honesty
The pure request-build and response-parse functions are unit-tested, and the
fake provider exercises the full text-in/text-out path. The methods that make
a live HTTP call ([OpenAiCompatEmbedder::embed], [CohereEmbedder::embed],
[CohereReranker::rerank]) are thin shells around those tested helpers and a
ureq call; live network calls are not in CI (stated, not faked).