pub struct PagedKvCache { /* private fields */ }Expand description
Paged KV cache.
Memory is allocated in fixed-size pages (blocks) of PAGE_SIZE tokens.
Pages grow on demand — short sequences don’t waste memory for unused
context positions. The assemble_* methods reconstruct contiguous
slices for attention computation.
Implementations§
Source§impl PagedKvCache
impl PagedKvCache
Sourcepub fn new(num_layers: usize, max_seq_len: usize, kv_dim: usize) -> Self
pub fn new(num_layers: usize, max_seq_len: usize, kv_dim: usize) -> Self
Create a new paged KV cache.
Unlike the contiguous cache, this does NOT pre-allocate all memory. Pages are allocated on demand as tokens are processed.
Sourcepub fn max_seq_len(&self) -> usize
pub fn max_seq_len(&self) -> usize
Returns the maximum sequence length.
Sourcepub fn num_layers(&self) -> usize
pub fn num_layers(&self) -> usize
Returns the number of layers.
Sourcepub fn total_pages(&self) -> usize
pub fn total_pages(&self) -> usize
Returns total number of allocated pages across all layers.
Sourcepub fn memory_bytes(&self) -> usize
pub fn memory_bytes(&self) -> usize
Returns total memory usage in bytes (approximate).
Sourcepub fn shrink_to_fit(&mut self)
pub fn shrink_to_fit(&mut self)
Shrink allocated pages to fit the current sequence length. Useful after trimming context.
Source§impl PagedKvCache
Extended paged KV cache operations (not part of the base trait).
impl PagedKvCache
Extended paged KV cache operations (not part of the base trait).
Sourcepub fn get_keys_into(&self, layer: usize, buf: &mut Vec<f32>) -> ArchResult<()>
pub fn get_keys_into(&self, layer: usize, buf: &mut Vec<f32>) -> ArchResult<()>
Copy all cached keys for a layer into a contiguous buffer.
This is the multi-page alternative to get_keys(). The caller
provides a reusable buffer to avoid repeated allocation.
Sourcepub fn get_values_into(
&self,
layer: usize,
buf: &mut Vec<f32>,
) -> ArchResult<()>
pub fn get_values_into( &self, layer: usize, buf: &mut Vec<f32>, ) -> ArchResult<()>
Copy all cached values for a layer into a contiguous buffer.
Sourcepub fn get_key_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>
pub fn get_key_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>
Read a specific token’s key data from the cache.
Sourcepub fn get_value_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>
pub fn get_value_token(&self, layer: usize, pos: usize) -> ArchResult<&[f32]>
Read a specific token’s value data from the cache.
Sourcepub fn iter_keys<F>(&self, layer: usize, f: F) -> ArchResult<()>
pub fn iter_keys<F>(&self, layer: usize, f: F) -> ArchResult<()>
Iterate over key tokens for a layer, calling f for each (pos, key_data).
Sourcepub fn iter_values<F>(&self, layer: usize, f: F) -> ArchResult<()>
pub fn iter_values<F>(&self, layer: usize, f: F) -> ArchResult<()>
Iterate over value tokens for a layer.
Trait Implementations§
Source§impl KvCacheAccess for PagedKvCache
impl KvCacheAccess for PagedKvCache
Source§fn store_kv(
&mut self,
layer: usize,
key: &[f32],
value: &[f32],
) -> ArchResult<()>
fn store_kv( &mut self, layer: usize, key: &[f32], value: &[f32], ) -> ArchResult<()>
Source§fn get_keys(&self, layer: usize) -> ArchResult<&[f32]>
fn get_keys(&self, layer: usize) -> ArchResult<&[f32]>
Source§fn get_values(&self, layer: usize) -> ArchResult<&[f32]>
fn get_values(&self, layer: usize) -> ArchResult<&[f32]>
Source§fn for_each_key(
&self,
layer: usize,
f: &mut dyn FnMut(usize, &[f32]),
) -> ArchResult<()>
fn for_each_key( &self, layer: usize, f: &mut dyn FnMut(usize, &[f32]), ) -> ArchResult<()>
Source§fn for_each_value(
&self,
layer: usize,
f: &mut dyn FnMut(usize, &[f32]),
) -> ArchResult<()>
fn for_each_value( &self, layer: usize, f: &mut dyn FnMut(usize, &[f32]), ) -> ArchResult<()>
Auto Trait Implementations§
impl Freeze for PagedKvCache
impl RefUnwindSafe for PagedKvCache
impl Send for PagedKvCache
impl Sync for PagedKvCache
impl Unpin for PagedKvCache
impl UnsafeUnpin for PagedKvCache
impl UnwindSafe for PagedKvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more