Skip to main content

PagedKvCache

Struct PagedKvCache 

Source
pub struct PagedKvCache { /* private fields */ }
Expand description

Paged KV cache.

Memory is allocated in fixed-size pages (blocks) of PAGE_SIZE tokens. Pages grow on demand — short sequences don’t waste memory for unused context positions. The assemble_* methods reconstruct contiguous slices for attention computation.

Implementations§

Source§

impl PagedKvCache

Source

pub fn new(num_layers: usize, max_seq_len: usize, kv_dim: usize) -> Self

Create a new paged KV cache.

Unlike the contiguous cache, this does NOT pre-allocate all memory. Pages are allocated on demand as tokens are processed.

Source

pub fn page_size(&self) -> usize

Returns the page size (tokens per page).

Source

pub fn max_seq_len(&self) -> usize

Returns the maximum sequence length.

Source

pub fn kv_dim(&self) -> usize

Returns the KV dimension per token.

Source

pub fn num_layers(&self) -> usize

Returns the number of layers.

Source

pub fn total_pages(&self) -> usize

Returns total number of allocated pages across all layers.

Source

pub fn memory_bytes(&self) -> usize

Returns total memory usage in bytes (approximate).

Source

pub fn clear(&mut self)

Reset the cache, freeing all pages.

Source

pub fn shrink_to_fit(&mut self)

Shrink allocated pages to fit the current sequence length. Useful after trimming context.

Source§

impl PagedKvCache

Extended paged KV cache operations (not part of the base trait).

Source

pub fn get_keys_into(&self, layer: usize, buf: &mut Vec<f32>) -> ArchResult<()>

Copy all cached keys for a layer into a contiguous buffer.

This is the multi-page alternative to get_keys(). The caller provides a reusable buffer to avoid repeated allocation.

Source

pub fn get_values_into( &self, layer: usize, buf: &mut Vec<f32>, ) -> ArchResult<()>

Copy all cached values for a layer into a contiguous buffer.

Source

pub fn get_key_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>

Read a specific token’s key data from the cache.

Source

pub fn get_value_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>

Read a specific token’s value data from the cache.

Source

pub fn iter_keys<F>(&self, layer: usize, f: F) -> ArchResult<()>
where F: FnMut(usize, &[f32]),

Iterate over key tokens for a layer, calling f for each (pos, key_data).

Source

pub fn iter_values<F>(&self, layer: usize, f: F) -> ArchResult<()>
where F: FnMut(usize, &[f32]),

Iterate over value tokens for a layer.

Trait Implementations§

Source§

impl KvCacheAccess for PagedKvCache

Source§

fn seq_len(&self) -> usize

Get the current sequence length (number of cached tokens).
Source§

fn store_kv( &mut self, layer: usize, key: &[f32], value: &[f32], ) -> ArchResult<()>

Store key and value tensors for a layer at the current position.
Source§

fn get_keys(&self, layer: usize) -> ArchResult<&[f32]>

Retrieve all cached keys for a layer up to the current sequence length.
Source§

fn get_values(&self, layer: usize) -> ArchResult<&[f32]>

Retrieve all cached values for a layer up to the current sequence length.
Source§

fn advance(&mut self)

Advance the cache position by one token. Read more
Source§

fn kv_dim(&self) -> usize

KV dimension per token (num_kv_heads * head_dim). Read more
Source§

fn for_each_key( &self, layer: usize, f: &mut dyn FnMut(usize, &[f32]), ) -> ArchResult<()>

Iterate over every cached key token for layer, calling f(pos, key_data). Read more
Source§

fn for_each_value( &self, layer: usize, f: &mut dyn FnMut(usize, &[f32]), ) -> ArchResult<()>

Iterate over every cached value token for layer, calling f(pos, value_data). Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more