Expand description
Bounded admission gate.
Per docs/protocol-v1.md §“Admission semantics”: at most
active_permits + queue_depth outstanding requests across the
whole daemon at any time. The (active_permits + 1)th request is
still admitted (it just queues behind the active one); the
(active_permits + queue_depth + 1)th is rejected immediately
with Response::Error{code: queue_full}.
Implementation: a single tokio::sync::Semaphore whose total
permit count is active_permits + queue_depth. Per-request
flow:
try_acquire_owned— non-blocking. If no permit available, returnQueueFull. The wire layer translates that into a terminalResponse::Errorframe.- Hold the permit for the duration of the generation (the
OwnedSemaphorePermitguard goes onto the request future’s stack). - Drop on completion / cancellation. Permit returns to the pool, freeing slot for the next admit.
For v0.1 the daemon’s only backend (llamacpp) is single-
threaded internally — concurrent generates serialise on its
inner mutex. Setting active_permits=1 matches that reality
without bottlenecking the wire layer (which is happy to read
and queue many requests). v0.2’s continuous-batching backends
will raise active_permits above 1.
Structs§
- Admission
- Shared admission gate handed to every per-connection task via
lifecycle::AcceptContext. Cheap to clone (just anArcbump).
Enums§
- Submit
Error - Errors returned by
Admission::try_admit.