pub struct KvCache {
pub keys: Vec<ArrayD<f64>>,
pub values: Vec<ArrayD<f64>>,
pub seq_len: usize,
pub max_seq_len: usize,
pub num_layers: usize,
pub num_heads: usize,
}Expand description
Cached key-value pairs for autoregressive inference using ndarray tensors.
Stores per-layer key and value tensors as dynamic-rank ArrayD<f64>.
Tensors are concatenated along the sequence dimension on each append_kv call.
Fields§
§keys: Vec<ArrayD<f64>>Cached K tensors per layer (shape: [seq_len, num_heads, head_dim] after appends).
values: Vec<ArrayD<f64>>Cached V tensors per layer (shape: [seq_len, num_heads, head_dim] after appends).
seq_len: usizeCurrent cached sequence length.
max_seq_len: usizeMaximum allowed sequence length.
num_layers: usizeNumber of transformer layers.
num_heads: usizeNumber of attention heads.
Implementations§
Source§impl KvCache
impl KvCache
Sourcepub fn new(
num_layers: usize,
num_heads: usize,
head_dim: usize,
max_seq_len: usize,
) -> Self
pub fn new( num_layers: usize, num_heads: usize, head_dim: usize, max_seq_len: usize, ) -> Self
Create a new, empty KV-cache.
Initially all per-layer tensors are zero-sized along the sequence dimension.
Sourcepub fn append_kv(
&mut self,
layer: usize,
new_k: ArrayD<f64>,
new_v: ArrayD<f64>,
) -> Result<(), KvCacheError>
pub fn append_kv( &mut self, layer: usize, new_k: ArrayD<f64>, new_v: ArrayD<f64>, ) -> Result<(), KvCacheError>
Append new key and value tensors for the given layer.
new_k and new_v must have shape [new_tokens, num_heads, head_dim].
Sourcepub fn get_kv(&self, layer: usize) -> Option<(&ArrayD<f64>, &ArrayD<f64>)>
pub fn get_kv(&self, layer: usize) -> Option<(&ArrayD<f64>, &ArrayD<f64>)>
Retrieve cached keys and values for the given layer.
Returns None if the layer index is out of bounds.
Sourcepub fn current_len(&self) -> usize
pub fn current_len(&self) -> usize
Current cached sequence length.
Sourcepub fn memory_usage_bytes(&self) -> usize
pub fn memory_usage_bytes(&self) -> usize
Approximate memory usage in bytes (f64 = 8 bytes per element).
Trait Implementations§
Auto Trait Implementations§
impl Freeze for KvCache
impl RefUnwindSafe for KvCache
impl Send for KvCache
impl Sync for KvCache
impl Unpin for KvCache
impl UnsafeUnpin for KvCache
impl UnwindSafe for KvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more