Request

inference_lab::request::request

Struct Request

pub struct Request {Show 17 fields
    pub request_id: String,
    pub priority: i32,
    pub arrival_time: f64,
    pub status: RequestStatus,
    pub num_prompt_tokens: u32,
    pub max_output_tokens: u32,
    pub num_computed_tokens: u32,
    pub num_output_tokens: u32,
    pub num_tokens: u32,
    pub num_cached_tokens: u32,
    pub kv_blocks: Vec<BlockId>,
    pub num_preemptions: u32,
    pub first_token_time: Option<f64>,
    pub completion_time: Option<f64>,
    pub token_generation_times: Vec<f64>,
    pub preempted_time: f64,
    pub last_preempted_at: Option<f64>,
}

Expand description

Request represents a single inference request in the simulation

Fields§

§request_id: String

Unique request ID

§priority: i32

Client priority (lower = higher priority)

§arrival_time: f64

Arrival time (simulated time)

§status: RequestStatus

Request status

§num_prompt_tokens: u32

Number of input tokens

§max_output_tokens: u32

Maximum number of output tokens to generate

§num_computed_tokens: u32

Number of tokens computed so far

§num_output_tokens: u32

Number of output tokens generated so far

§num_tokens: u32

Total tokens (prompt + output)

§num_cached_tokens: u32

Number of prefix-cached tokens

§kv_blocks: Vec<BlockId>

KV cache blocks allocated to this request

§num_preemptions: u32

Number of times this request has been preempted

§first_token_time: Option<f64>

Time when first token was generated (TTFT tracking)

§completion_time: Option<f64>

Time when request completed

§token_generation_times: Vec<f64>

Per-token generation times

§preempted_time: f64

Time spent preempted (not running)

§last_preempted_at: Option<f64>

Last preemption start time

Implementations§

impl Request

pub fn new( request_id: String, priority: i32, arrival_time: f64, num_prompt_tokens: u32, max_output_tokens: u32, ) -> Self

Create a new request

pub fn is_prefill(&self) -> bool

Check if this is in prefill phase

pub fn tokens_to_process(&self) -> u32

Get number of tokens needed to process

pub fn is_finished(&self) -> bool

Check if request is done

pub fn kv_cache_size(&self, model: &ModelConfig) -> u64

Calculate KV cache requirement for this request

pub fn record_generated_tokens( &mut self, num_new_tokens: u32, current_time: f64, )

Record that tokens were generated (update output token count and total)

pub fn mark_preempted(&mut self, current_time: f64)

Mark request as preempted

pub fn resume(&mut self, current_time: f64)

Resume a preempted request

Trait Implementations§

impl Clone for Request

fn clone(&self) -> Request

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for Request

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

impl Freeze for Request

impl RefUnwindSafe for Request

impl Send for Request

impl Sync for Request

impl Unpin for Request

impl UnwindSafe for Request

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V