pub struct KvCache { /* private fields */ }Expand description
Simple contiguous KV cache implementation.
Stores key and value tensors for all layers in contiguous FP32 buffers. Each layer has a separate key buffer and value buffer, sized for the maximum context length.
Implementations§
Source§impl KvCache
impl KvCache
Sourcepub fn new(num_layers: usize, max_seq_len: usize, kv_dim: usize) -> Self
pub fn new(num_layers: usize, max_seq_len: usize, kv_dim: usize) -> Self
Allocate a new KV cache.
§Arguments
num_layers- Number of transformer layers.max_seq_len- Maximum context length.kv_dim- KV dimension per token (num_kv_heads * head_dim).
Sourcepub fn max_seq_len(&self) -> usize
pub fn max_seq_len(&self) -> usize
Returns the maximum sequence length.
Sourcepub fn num_layers(&self) -> usize
pub fn num_layers(&self) -> usize
Returns the number of layers.
Sourcepub fn restore_from_snapshot(
&mut self,
keys: &[Vec<f32>],
values: &[Vec<f32>],
seq_len: usize,
)
pub fn restore_from_snapshot( &mut self, keys: &[Vec<f32>], values: &[Vec<f32>], seq_len: usize, )
Restore from a prefix cache snapshot.
Copies the provided per-layer key/value data into internal buffers
and sets seq_len to the snapshot’s length. The caller must ensure
that keys.len() == values.len() == num_layers and that each inner
vec has seq_len * kv_dim elements.
Sourcepub fn truncate(&mut self, n: usize)
pub fn truncate(&mut self, n: usize)
Truncate the KV cache to n tokens.
After this call seq_len() returns n (clamped to the current
seq_len if n is already beyond it — truncate never extends the
cache). The underlying buffers are not zeroed; the truncated
region is simply considered invalid and will be overwritten on the
next store_kv call.
This is the low-level primitive for speculative-decoding rollback: the
target engine calls truncate(divergence_pos) after rejecting a draft
token, then continues generating from divergence_pos.
Sourcepub fn snapshot(&self) -> KvCacheSnapshot
pub fn snapshot(&self) -> KvCacheSnapshot
Capture a snapshot of the current KV state.
Only the data up to seq_len * kv_dim is copied per layer, keeping
the snapshot compact.
Sourcepub fn to_payload(&self) -> KvStatePayload
pub fn to_payload(&self) -> KvStatePayload
Build a serializable crate::snapshot::KvStatePayload from the current state.
Sourcepub fn restore_from_payload(
&mut self,
payload: &KvStatePayload,
) -> RuntimeResult<()>
pub fn restore_from_payload( &mut self, payload: &KvStatePayload, ) -> RuntimeResult<()>
Restore cache state from a crate::snapshot::KvStatePayload.
Validates that layer count and dimensions match the cache configuration, then restores the key/value buffers and sequence length.
Trait Implementations§
Source§impl KvCacheAccess for KvCache
impl KvCacheAccess for KvCache
Source§fn store_kv(
&mut self,
layer: usize,
key: &[f32],
value: &[f32],
) -> ArchResult<()>
fn store_kv( &mut self, layer: usize, key: &[f32], value: &[f32], ) -> ArchResult<()>
Source§fn get_keys(&self, layer: usize) -> ArchResult<&[f32]>
fn get_keys(&self, layer: usize) -> ArchResult<&[f32]>
Source§fn get_values(&self, layer: usize) -> ArchResult<&[f32]>
fn get_values(&self, layer: usize) -> ArchResult<&[f32]>
Auto Trait Implementations§
impl Freeze for KvCache
impl RefUnwindSafe for KvCache
impl Send for KvCache
impl Sync for KvCache
impl Unpin for KvCache
impl UnsafeUnpin for KvCache
impl UnwindSafe for KvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more