Expand description
Request types for the continuous batching serving engine
This module defines the core request structures used throughout the serving system, including inference requests, running requests, and completed requests.
Structs§
- Completed
Request - Result of a completed request
- Inference
Request - An incoming inference request
- Request
Id - Unique identifier for a request
- Running
Request - A request that is currently being processed
- Token
Output - Output from a single token generation step
Enums§
- Finish
Reason - Reason for request completion
- Priority
- Priority level for request scheduling
- Request
State - State of a request in the serving pipeline