Skip to main content

Module batch_engine

oxibonsai_runtime

Module batch_engine

Expand description

Batched inference: process multiple prompts efficiently.

Groups prompts into batches for prefill, then generates independently. Provides a RequestQueue for continuous batching scenarios where requests arrive over time and are drained in configurable batch sizes.

Structs§

BatchConfig: Batch inference configuration.
BatchRequest: A single queued inference request.
BatchResult: Result of a single batch element.
RequestQueue: Request queue for continuous batching.

Enums§

FinishReason: Reason why token generation stopped.

Functions§

batch_generate: Process a batch of prompts sequentially (sharing the engine).
batch_generate_with_timeout: Process a batch with timeout per request.