Skip to main content

Module request_queue

Module request_queue 

Source
Expand description

Bounded request queue with backpressure for the inference pipeline.

BoundedQueue is a generic FIFO queue with a fixed capacity. When the queue is full, BoundedQueue::try_push returns false immediately, allowing the caller to issue an HTTP 503 response rather than blocking indefinitely. BoundedQueue::push_timeout blocks for up to a given Duration waiting for a slot to become available.

InferenceQueue builds on top of BoundedQueue and wraps every submitted work item with a one-shot std::sync::mpsc channel so that callers can await the inference result asynchronously.

Structsยง

BoundedQueue
Thread-safe bounded FIFO queue with condvar-based blocking and backpressure.
InferenceQueue
High-level inference request queue wrapping BoundedQueue<InferenceWorkItem>.
InferenceWorkItem
A single unit of work to be processed by the inference engine.
QueueStats
A serialisable snapshot of queue utilisation.