pub struct BatchRequest {
pub id: u64,
pub prompt_tokens: Vec<u32>,
pub params: SamplingParams,
pub max_tokens: usize,
pub priority: RequestPriority,
pub state: RequestState,
pub generated_tokens: Vec<u32>,
pub created_at: Instant,
pub started_at: Option<Instant>,
pub completed_at: Option<Instant>,
}Expand description
A single inference request managed by the continuous-batch scheduler.
Fields§
§id: u64Unique request identifier returned by ContinuousBatchScheduler::submit.
prompt_tokens: Vec<u32>Tokenised prompt.
params: SamplingParamsSampling parameters for this request.
max_tokens: usizeMaximum number of tokens to generate.
priority: RequestPriorityScheduling priority.
state: RequestStateCurrent lifecycle state.
generated_tokens: Vec<u32>Tokens generated so far (not including the prompt).
created_at: InstantWall-clock time at which the request was submitted.
started_at: Option<Instant>Wall-clock time at which the first token was generated (prefill complete).
completed_at: Option<Instant>Wall-clock time at which generation finished.
Implementations§
Source§impl BatchRequest
impl BatchRequest
Sourcepub fn new(
id: u64,
prompt_tokens: Vec<u32>,
params: SamplingParams,
max_tokens: usize,
) -> Self
pub fn new( id: u64, prompt_tokens: Vec<u32>, params: SamplingParams, max_tokens: usize, ) -> Self
Create a new request with Normal priority.
Sourcepub fn with_priority(self, priority: RequestPriority) -> Self
pub fn with_priority(self, priority: RequestPriority) -> Self
Override the priority, returning self for builder-style chaining.
Sourcepub fn time_to_first_token(&self) -> Option<Duration>
pub fn time_to_first_token(&self) -> Option<Duration>
Elapsed time from submission to first generated token.
Returns None if the first token has not yet been produced.
Sourcepub fn total_latency(&self) -> Option<Duration>
pub fn total_latency(&self) -> Option<Duration>
Elapsed time from submission to completion.
Returns None if the request has not yet completed.
Sourcepub fn tokens_generated(&self) -> usize
pub fn tokens_generated(&self) -> usize
Number of tokens generated so far.
Sourcepub fn is_finished(&self) -> bool
pub fn is_finished(&self) -> bool
true when the request is in RequestState::Completed or
RequestState::Failed.
Auto Trait Implementations§
impl Freeze for BatchRequest
impl RefUnwindSafe for BatchRequest
impl Send for BatchRequest
impl Sync for BatchRequest
impl Unpin for BatchRequest
impl UnsafeUnpin for BatchRequest
impl UnwindSafe for BatchRequest
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more