Expand description
Dynamic batching for inference serving.
This module provides dynamic batching capabilities for efficient inference serving:
- Automatic request batching with configurable timeouts
- Priority-based request queuing
- Adaptive batch sizing based on load
- Request deduplication
- Batch splitting for heterogeneous requests
- Latency and throughput optimization
Structs§
- Adaptive
Batcher - Adaptive batch size controller.
- Batch
Request - A request to be batched.
- Batching
Stats - Statistics for dynamic batching.
- Dynamic
Batch Config - Configuration for dynamic batching.
- Dynamic
Batcher - Dynamic batcher for inference requests.
- Request
Metadata - Request metadata for batching decisions.
- Request
Queue - Request queue with priority support.
Enums§
- Batching
Error - Dynamic batching errors.
- Priority
- Priority level for requests.