Struct UnifiedBatchItem

Source

pub struct UnifiedBatchItem {
    pub seq_id: String,
    pub q_tokens: Vec<u32>,
    pub kv_cache: Arc<dyn KvCacheHandle>,
    pub pos_offset: usize,
    pub is_final_chunk: bool,
}

Expand description

One sequence’s contribution to a unified mixed-batch forward.

A unified batch lets a single model forward pass process a mix of per-sequence work units: a prefill chunk (q_tokens.len() ≥ 1, possibly continuing from pos_offset > 0 for chunked prefill) and a decode step (q_tokens.len() == 1, pos_offset = current cache length) coexist in the same call. The model layer concatenates all q_tokens into one [M_total, hidden] tensor and runs all GEMMs / norms once; only the attention kernel sees per-item segmentation.

This is the abstraction that enables vLLM-style chunked prefill where decode tokens for already-running sequences are produced in the same iter as a prefill chunk for a newly-arriving sequence.

Fields§

§seq_id: String

Identifier matching the sequence’s KV cache (model-side keying).

§q_tokens: Vec<u32>

Tokens to process this iter. For decode this is exactly 1 token; for prefill (chunked or whole) this is the chunk’s tokens.

§kv_cache: Arc<dyn KvCacheHandle>

KV cache handle for this sequence.

§pos_offset: usize

Starting absolute position for the FIRST token in q_tokens. 0 for a fresh prefill, kv_len for a decode step or a continuing chunked-prefill slice.

§is_final_chunk: bool

True iff this item completes the request’s prefill (or is a decode item) — i.e. logits at the last token of q_tokens should be returned for sampling. Intermediate prefill chunks set this false to skip the lm_head + sampling path.

UnifiedBatchItem

Struct UnifiedBatchItem Copy item path

Fields§

Trait Implementations§

impl Clone for UnifiedBatchItem

fn clone(&self) -> UnifiedBatchItem

fn clone_from(&mut self, source: &Self)

impl Debug for UnifiedBatchItem

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for UnifiedBatchItem

impl !RefUnwindSafe for UnifiedBatchItem

impl Send for UnifiedBatchItem

impl Sync for UnifiedBatchItem

impl Unpin for UnifiedBatchItem

impl UnsafeUnpin for UnifiedBatchItem

impl !UnwindSafe for UnifiedBatchItem

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

Struct UnifiedBatchItem

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,