Skip to main content

Module request

Module request 

Source
Expand description

Request types for the continuous batching serving engine

This module defines the core request structures used throughout the serving system, including inference requests, running requests, and completed requests.

Structs§

CompletedRequest
Result of a completed request
InferenceRequest
An incoming inference request
RequestId
Unique identifier for a request
RunningRequest
A request that is currently being processed
TokenOutput
Output from a single token generation step

Enums§

FinishReason
Reason for request completion
Priority
Priority level for request scheduling
RequestState
State of a request in the serving pipeline