Expand description
Bounded request queue with backpressure for the inference pipeline.
BoundedQueue is a generic FIFO queue with a fixed capacity. When the
queue is full, BoundedQueue::try_push returns false immediately,
allowing the caller to issue an HTTP 503 response rather than blocking
indefinitely. BoundedQueue::push_timeout blocks for up to a given
Duration waiting for a slot to become available.
InferenceQueue builds on top of BoundedQueue and wraps every
submitted work item with a one-shot std::sync::mpsc channel so that
callers can await the inference result asynchronously.
Structsยง
- Bounded
Queue - Thread-safe bounded FIFO queue with condvar-based blocking and backpressure.
- Inference
Queue - High-level inference request queue wrapping
BoundedQueue<InferenceWorkItem>. - Inference
Work Item - A single unit of work to be processed by the inference engine.
- Queue
Stats - A serialisable snapshot of queue utilisation.