Skip to main content

Module queue

Module queue 

Source
Expand description

Bounded admission gate.

Per docs/protocol-v1.md §“Admission semantics”: at most active_permits + queue_depth outstanding requests across the whole daemon at any time. The (active_permits + 1)th request is still admitted (it just queues behind the active one); the (active_permits + queue_depth + 1)th is rejected immediately with Response::Error{code: queue_full}.

Implementation: a single tokio::sync::Semaphore whose total permit count is active_permits + queue_depth. Per-request flow:

  1. try_acquire_owned — non-blocking. If no permit available, return QueueFull. The wire layer translates that into a terminal Response::Error frame.
  2. Hold the permit for the duration of the generation (the OwnedSemaphorePermit guard goes onto the request future’s stack).
  3. Drop on completion / cancellation. Permit returns to the pool, freeing slot for the next admit.

For v0.1 the daemon’s only backend (llamacpp) is single- threaded internally — concurrent generates serialise on its inner mutex. Setting active_permits=1 matches that reality without bottlenecking the wire layer (which is happy to read and queue many requests). v0.2’s continuous-batching backends will raise active_permits above 1.

Structs§

Admission
Shared admission gate handed to every per-connection task via lifecycle::AcceptContext. Cheap to clone (just an Arc bump).

Enums§

SubmitError
Errors returned by Admission::try_admit.