Expand description
The Quiver daemon: gRPC and REST over the embeddable Database, with
API-key auth and secure-by-default configuration.
Both transports are thin shells over the same shared engine operations; the
engine is synchronous and CPU/fsync-bound, so every call is offloaded with
spawn_blocking. Access is guarded by a reader–writer lock (ADR-0057): reads
take the shared lock and run concurrently, writes take the exclusive lock,
and the single-writer model is unchanged (ADR-0002/0006). A read that finds a
collection’s index stale (a prior write deferred its rebuild) serves the prior
snapshot and schedules an off-lock rebuild (ADR-0062): the index is rebuilt
under the shared lock and swapped in under a brief write lock, so a rebuild never
stalls concurrent readers.
Auth is by scoped API key (Bearer / gRPC authorization metadata) with
default-deny RBAC: each key carries a role (read ⊆ write ⊆ admin) and a
collection scope, enforced on every operation at the shared op layer
(ADR-0011/0013, the auth module). Encryption-at-rest is on by default
(ADR-0010): unless insecure is set, an encryption_key is required and the
engine is opened through quiver-crypto’s AEAD codec; payloads may also be
client-side-encrypted (ADR-0012). TLS-in-transit uses rustls over the
audited ring provider — REST via axum-server, gRPC via tonic’s tls-ring
— and a non-loopback bind requires it; setting a client CA additionally
requires mutual TLS. Mutating and administrative operations, and every
access-control denial, are recorded to an append-only audit log (ADR-0011,
the audit module) when audit_log is set. Per-request cost limits bound
the work any single authenticated request can demand (ADR-0040, the
Limits type), and an opt-in per-key token-bucket rate limiter bounds the
request rate (ADR-0049, the RateLimiter type); per-tenant engine
partitioning is a later phase. Design: docs/api/rest-grpc.md.
Structs§
- ApiKey
- A configured API key: a bearer secret, the role it grants, and the collections it is scoped to.
- Autoscale
Config - Opt-in automatic scale-out policy for the coordinator (ADR-0065 increment 5).
When enabled, the coordinator samples each shard’s point count and, when the
busiest crosses
high_water_points, grows the cluster by joining one of thestandby_urls— driving the same safe online migration as a manualPOST /cluster/shards/grow. An explicit policy, not magic: nothing scales without a configured threshold and a standby to grow into, a cooldown bounds the rate, and a migration in flight is never interrupted. Scale-in is deliberately not automated here (a safe online drain is a separate increment); shrink stays a manual, drainedDELETE /cluster/shards/{id}. - Config
- Server configuration, layered defaults →
quiver.toml→QUIVER_*env and validated at startup (ADR-0013). - Embed
Registry - Per-collection embedding/rerank providers, built once at startup from config.
- Embedding
Config - A collection’s embedding configuration (server config table
[embedding.<collection>]). Secrets are referenced by env-var name only. - Limits
- Per-request cost limits (ADR-0040). Each cap bounds the work a single
authenticated request can demand, so one oversized request cannot exhaust the
node under the single-writer model (ADR-0006). Over-limit requests are
rejected with HTTP 400 / gRPC
InvalidArgumentrather than silently clamped — a truncatedkoref_searchwould return surprising, lower-quality results with no signal. Defaults are generous; raise a cap with a[limits]table inquiver.tomlor the matchingQUIVER_MAX_*environment variable. - Otlp
Config - OpenTelemetry traces export configuration (
[otlp]inquiver.toml, or theQUIVER_OTLP_*environment variables). Disabled unless anendpointis set. - Rate
Limit Config - Configuration for the per-key limiter.
requests_per_second == 0(the default) disables it entirely; like every other guard, it is opt-in. - Rate
Limit Snapshot - A successful consume, surfaced as
RateLimit-*response headers. - Rate
Limiter - An opt-in, in-memory, per-key token-bucket rate limiter.
- Rerank
Config - A collection’s rerank configuration (server config table
[rerank.<collection>]).
Enums§
- Action
- An action a caller may be permitted to perform, ordered by privilege so that
a higher role implies the lower ones (
Read < Write < Admin). - Collection
Scope - Which collections a key may touch.
- Error
- An error from the server or the engine beneath it.
- Provider
Error - An error from a provider call or its configuration.
- Provider
Kind - Which provider backs a collection’s embedding (or rerank). The OpenAI-compatible
trio (
openai,ollama,http) share one adapter;cohereis its own;fakeis deterministic and for tests/acceptance only. - Rate
Decision - The outcome of a rate-limit check.
Traits§
- Embedding
Provider - Embeds a batch of texts into dense vectors (one per input).
- Rerank
Provider - Scores
(query, document)pairs for relevance; higher is more relevant.
Functions§
- init_
observability - Install the global tracing subscriber: an
RUST_LOG-drivenfmtlayer plus, when theotlpfeature is built and an OTLP endpoint is configured (ADR-0059), an OpenTelemetry traces export layer. Safe to call once at startup; a second call is a no-op. A failure to build the OTLP exporter logs a warning and falls back tofmt-only rather than taking the server down. - init_
tracing - Initialize structured logging from
RUST_LOG(defaulting toinfo). Safe to call once at startup; a second call is ignored. - run
- Run the server from
configuntil a shutdown signal (Ctrl-C). - serve
- Serve REST and gRPC on the given (already-bound) listeners until a transport error. Exposed so tests can bind ephemeral ports.
- shutdown_
observability - Flush and shut down the OpenTelemetry exporter, if one was installed. A no-op
without the
otlpfeature or when no endpoint was configured. Call once on server shutdown so batched spans are not lost.