Skip to main content

Module batch_engine

Module batch_engine 

Source
Expand description

Batched inference: process multiple prompts efficiently.

Groups prompts into batches for prefill, then generates independently. Provides a RequestQueue for continuous batching scenarios where requests arrive over time and are drained in configurable batch sizes.

Structs§

BatchConfig
Batch inference configuration.
BatchRequest
A single queued inference request.
BatchResult
Result of a single batch element.
RequestQueue
Request queue for continuous batching.

Enums§

FinishReason
Reason why token generation stopped.

Functions§

batch_generate
Process a batch of prompts sequentially (sharing the engine).
batch_generate_with_timeout
Process a batch with timeout per request.