pub struct KVCache { /* private fields */ }Expand description
KV-cache for efficient autoregressive generation.
Stores the key and value tensors from previous positions so they don’t need to be recomputed at each generation step. Each layer has its own cache entry.
§Shapes
keys[i]:[batch, num_kv_heads, seq_len, head_dim]values[i]:[batch, num_kv_heads, seq_len, head_dim]
Implementations§
Source§impl KVCache
impl KVCache
Sourcepub fn seq_len(&self) -> Result<usize>
pub fn seq_len(&self) -> Result<usize>
Current sequence length from the cache (0 if empty).
§Errors
Returns MIError::Model if a cached tensor has an unexpected shape.
Sourcepub fn layer_mut(
&mut self,
layer: usize,
) -> Result<(&mut Option<Tensor>, &mut Option<Tensor>)>
pub fn layer_mut( &mut self, layer: usize, ) -> Result<(&mut Option<Tensor>, &mut Option<Tensor>)>
Get mutable references to the cache entry for a specific layer.
Returns (&mut Option<Tensor>, &mut Option<Tensor>) for (key, value).
§Errors
Returns MIError::Hook if layer is out of range.
Sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Estimate memory usage in bytes.
Returns the total memory used by all cached tensors.
Sourcepub fn trim_to(&mut self, max_seq_len: usize) -> Result<bool>
pub fn trim_to(&mut self, max_seq_len: usize) -> Result<bool>
Trim the cache to keep only the last max_seq_len tokens.
Useful for memory-constrained scenarios with long sequences.
Returns Ok(true) if trimming occurred, Ok(false) if no
trimming was needed.
§Errors
Returns MIError::Model if tensor operations fail.
Sourcepub fn enforce_memory_limit(&mut self, max_bytes: usize) -> Result<bool>
pub fn enforce_memory_limit(&mut self, max_bytes: usize) -> Result<bool>
Check if cache exceeds memory limit and trim if needed.
Trims to ~75% of current length if memory limit is exceeded.
Returns Ok(true) if trimming occurred.
§Errors
Returns MIError::Model if tensor operations fail.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for KVCache
impl !RefUnwindSafe for KVCache
impl Send for KVCache
impl Sync for KVCache
impl Unpin for KVCache
impl UnsafeUnpin for KVCache
impl !UnwindSafe for KVCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more