Skip to main content

Module dynamic_batching

Module dynamic_batching 

Source
Expand description

Dynamic batching for inference serving.

This module provides dynamic batching capabilities for efficient inference serving:

  • Automatic request batching with configurable timeouts
  • Priority-based request queuing
  • Adaptive batch sizing based on load
  • Request deduplication
  • Batch splitting for heterogeneous requests
  • Latency and throughput optimization

Structs§

AdaptiveBatcher
Adaptive batch size controller.
BatchRequest
A request to be batched.
BatchingStats
Statistics for dynamic batching.
DynamicBatchConfig
Configuration for dynamic batching.
DynamicBatcher
Dynamic batcher for inference requests.
RequestMetadata
Request metadata for batching decisions.
RequestQueue
Request queue with priority support.

Enums§

BatchingError
Dynamic batching errors.
Priority
Priority level for requests.