Skip to main content

Module dynamic_batching

tensorlogic_infer

Module dynamic_batching

Expand description

Dynamic batching for inference serving.

This module provides dynamic batching capabilities for efficient inference serving:

Automatic request batching with configurable timeouts
Priority-based request queuing
Adaptive batch sizing based on load
Request deduplication
Batch splitting for heterogeneous requests
Latency and throughput optimization

Structs§

AdaptiveBatcher: Adaptive batch size controller.
BatchRequest: A request to be batched.
BatchingStats: Statistics for dynamic batching.
DynamicBatchConfig: Configuration for dynamic batching.
DynamicBatcher: Dynamic batcher for inference requests.
RequestMetadata: Request metadata for batching decisions.
RequestQueue: Request queue with priority support.

Enums§

BatchingError: Dynamic batching errors.
Priority: Priority level for requests.