Crate quiver_server

Expand description

The Quiver daemon: gRPC and REST over the embeddable Database, with API-key auth and secure-by-default configuration.

Both transports are thin shells over the same shared engine operations; the engine is synchronous and CPU/fsync-bound, so every call is offloaded with spawn_blocking. Access is guarded by a reader–writer lock (ADR-0057): reads take the shared lock and run concurrently, writes take the exclusive lock, and the single-writer model is unchanged (ADR-0002/0006). A read that finds a collection’s index stale (a prior write deferred its rebuild) serves the prior snapshot and schedules an off-lock rebuild (ADR-0062): the index is rebuilt under the shared lock and swapped in under a brief write lock, so a rebuild never stalls concurrent readers.

Auth is by scoped API key (Bearer / gRPC authorization metadata) with default-deny RBAC: each key carries a role (read ⊆ write ⊆ admin) and a collection scope, enforced on every operation at the shared op layer (ADR-0011/0013, the auth module). Encryption-at-rest is on by default (ADR-0010): unless insecure is set, an encryption_key is required and the engine is opened through quiver-crypto’s AEAD codec; payloads may also be client-side-encrypted (ADR-0012). TLS-in-transit uses rustls over the audited ring provider — REST via axum-server, gRPC via tonic’s tls-ring — and a non-loopback bind requires it; setting a client CA additionally requires mutual TLS. Mutating and administrative operations, and every access-control denial, are recorded to an append-only audit log (ADR-0011, the audit module) when audit_log is set. Per-request cost limits bound the work any single authenticated request can demand (ADR-0040, the Limits type), and an opt-in per-key token-bucket rate limiter bounds the request rate (ADR-0049, the RateLimiter type); per-tenant engine partitioning is a later phase. Design: docs/api/rest-grpc.md.

Structs§

ApiKey: A configured API key: a bearer secret, the role it grants, and the collections it is scoped to.
AutoscaleConfig: Opt-in automatic scale-out policy for the coordinator (ADR-0065 increment 5). When enabled, the coordinator samples each shard’s point count and, when the busiest crosses high_water_points, grows the cluster by joining one of the standby_urls — driving the same safe online migration as a manual POST /cluster/shards/grow. An explicit policy, not magic: nothing scales without a configured threshold and a standby to grow into, a cooldown bounds the rate, and a migration in flight is never interrupted. Scale-in is deliberately not automated here (a safe online drain is a separate increment); shrink stays a manual, drained DELETE /cluster/shards/{id}.
Config: Server configuration, layered defaults → quiver.toml → QUIVER_* env and validated at startup (ADR-0013).
EmbedRegistry: Per-collection embedding/rerank providers, built once at startup from config.
EmbeddingConfig: A collection’s embedding configuration (server config table [embedding.<collection>]). Secrets are referenced by env-var name only.
Limits: Per-request cost limits (ADR-0040). Each cap bounds the work a single authenticated request can demand, so one oversized request cannot exhaust the node under the single-writer model (ADR-0006). Over-limit requests are rejected with HTTP 400 / gRPC InvalidArgument rather than silently clamped — a truncated k or ef_search would return surprising, lower-quality results with no signal. Defaults are generous; raise a cap with a [limits] table in quiver.toml or the matching QUIVER_MAX_* environment variable.
OtlpConfig: OpenTelemetry traces export configuration ([otlp] in quiver.toml, or the QUIVER_OTLP_* environment variables). Disabled unless an endpoint is set.
RateLimitConfig: Configuration for the per-key limiter. requests_per_second == 0 (the default) disables it entirely; like every other guard, it is opt-in.
RateLimitSnapshot: A successful consume, surfaced as RateLimit-* response headers.
RateLimiter: An opt-in, in-memory, per-key token-bucket rate limiter.
RerankConfig: A collection’s rerank configuration (server config table [rerank.<collection>]).

Enums§

Action: An action a caller may be permitted to perform, ordered by privilege so that a higher role implies the lower ones (Read < Write < Admin).
CollectionScope: Which collections a key may touch.
Error: An error from the server or the engine beneath it.
ProviderError: An error from a provider call or its configuration.
ProviderKind: Which provider backs a collection’s embedding (or rerank). The OpenAI-compatible trio (openai, ollama, http) share one adapter; cohere is its own; fake is deterministic and for tests/acceptance only.
RateDecision: The outcome of a rate-limit check.

Traits§

EmbeddingProvider: Embeds a batch of texts into dense vectors (one per input).
RerankProvider: Scores (query, document) pairs for relevance; higher is more relevant.

Functions§

init_observability: Install the global tracing subscriber: an RUST_LOG-driven fmt layer plus, when the otlp feature is built and an OTLP endpoint is configured (ADR-0059), an OpenTelemetry traces export layer. Safe to call once at startup; a second call is a no-op. A failure to build the OTLP exporter logs a warning and falls back to fmt-only rather than taking the server down.
init_tracing: Initialize structured logging from RUST_LOG (defaulting to info). Safe to call once at startup; a second call is ignored.
run: Run the server from config until a shutdown signal (Ctrl-C).
serve: Serve REST and gRPC on the given (already-bound) listeners until a transport error. Exposed so tests can bind ephemeral ports.
shutdown_observability: Flush and shut down the OpenTelemetry exporter, if one was installed. A no-op without the otlp feature or when no endpoint was configured. Call once on server shutdown so batched spans are not lost.