pub struct KVCache { /* private fields */ }Expand description
KV cache for a single model.
Pre-allocates buffers for the maximum sequence length to avoid allocations during generation.
Implementations§
Source§impl KVCache
impl KVCache
Sourcepub fn new(num_layers: usize, num_kv_heads: usize, head_dim: usize) -> Self
pub fn new(num_layers: usize, num_kv_heads: usize, head_dim: usize) -> Self
Create a new empty KV cache.
Sourcepub fn with_capacity(
num_layers: usize,
num_kv_heads: usize,
head_dim: usize,
max_seq_len: usize,
) -> Self
pub fn with_capacity( num_layers: usize, num_kv_heads: usize, head_dim: usize, max_seq_len: usize, ) -> Self
Create a new KV cache with pre-allocated capacity.
Sourcepub fn append(&mut self, layer: usize, k_data: &[f32], v_data: &[f32])
pub fn append(&mut self, layer: usize, k_data: &[f32], v_data: &[f32])
Append K and V vectors for the current token to a specific layer.
k_data and v_data should each have length num_kv_heads * head_dim.
Sourcepub fn advance(&mut self)
pub fn advance(&mut self)
Advance the sequence position by one token. Call this after appending K/V data to all layers.
Sourcepub fn k(&self, layer: usize) -> &[f32]
pub fn k(&self, layer: usize) -> &[f32]
Get the full K cache for a layer: [len * num_kv_heads * head_dim].
Sourcepub fn v(&self, layer: usize) -> &[f32]
pub fn v(&self, layer: usize) -> &[f32]
Get the full V cache for a layer: [len * num_kv_heads * head_dim].
Sourcepub fn num_layers(&self) -> usize
pub fn num_layers(&self) -> usize
Number of layers.
Sourcepub fn entry_size(&self) -> usize
pub fn entry_size(&self) -> usize
Entry size (num_kv_heads * head_dim) per token per layer.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for KVCache
impl RefUnwindSafe for KVCache
impl Send for KVCache
impl Sync for KVCache
impl Unpin for KVCache
impl UnsafeUnpin for KVCache
impl UnwindSafe for KVCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more