Expand description
Batched inference: process multiple prompts efficiently.
Groups prompts into batches for prefill, then generates independently.
Provides a RequestQueue for continuous batching scenarios where
requests arrive over time and are drained in configurable batch sizes.
Structs§
- Batch
Config - Batch inference configuration.
- Batch
Request - A single queued inference request.
- Batch
Result - Result of a single batch element.
- Request
Queue - Request queue for continuous batching.
Enums§
- Finish
Reason - Reason why token generation stopped.
Functions§
- batch_
generate - Process a batch of prompts sequentially (sharing the engine).
- batch_
generate_ with_ timeout - Process a batch with timeout per request.