pub struct LayerKvCache {
pub past_len: usize,
pub layers_k: Vec<Vec<f32>>,
pub layers_v: Vec<Vec<f32>>,
}Expand description
Layer-wise past K/V tensors in row-major [past_len * kv_dim] layout per layer.
Fields§
§past_len: usize§layers_k: Vec<Vec<f32>>§layers_v: Vec<Vec<f32>>Implementations§
Source§impl LayerKvCache
impl LayerKvCache
pub fn from_layer_outputs( num_layers: usize, batch: usize, past_seq: usize, kv_dim: usize, outputs: &[Vec<f32>], ) -> Result<Self, String>
Sourcepub fn from_layer_outputs_per_layer(
num_layers: usize,
batch: usize,
past_seq: usize,
kv_dims: &[usize],
outputs: &[Vec<f32>],
) -> Result<Self, String>
pub fn from_layer_outputs_per_layer( num_layers: usize, batch: usize, past_seq: usize, kv_dims: &[usize], outputs: &[Vec<f32>], ) -> Result<Self, String>
Like Self::from_layer_outputs but accepts a per-layer
kv_dim vector. Gemma 4 12B’s full-attention layers have
kv_dim = 1 * 512 = 512 while sliding layers have 8 * 256 = 2048; this constructor handles that heterogeneity.
Sourcepub fn pad_layers_to_upper(
&self,
upper: u64,
kv_dim: usize,
) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
pub fn pad_layers_to_upper( &self, upper: u64, kv_dim: usize, ) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
Pad each layer’s K/V to upper rows along the sequence axis (kv_dim inner).
Sourcepub fn pad_layers_to_upper_per_layer(
&self,
upper: u64,
kv_dims: &[usize],
) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
pub fn pad_layers_to_upper_per_layer( &self, upper: u64, kv_dims: &[usize], ) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
Like Self::pad_layers_to_upper but pads each layer to its
own kv_dim. The number of dims must equal the number of
cached layers.
Sourcepub fn advance_from_decode_outputs(
&mut self,
outputs: Vec<Vec<f32>>,
batch: usize,
kv_dim: usize,
) -> Result<(), String>
pub fn advance_from_decode_outputs( &mut self, outputs: Vec<Vec<f32>>, batch: usize, kv_dim: usize, ) -> Result<(), String>
Update cache from decode outputs: [logits, k0, v0, k1, v1, …] (bucket-padded).
Sourcepub fn trim_sliding_window_per_layer(
&mut self,
kv_dims_keep: &[Option<(usize, usize)>],
) -> Result<(), String>
pub fn trim_sliding_window_per_layer( &mut self, kv_dims_keep: &[Option<(usize, usize)>], ) -> Result<(), String>
Trim each layer’s K/V history to at most window rows on
the sequence axis, keeping the most recent rows. Used by
Gemma 3/4 sliding-attention layers — long contexts can keep
only the last window (e.g. 1024) tokens per sliding layer
without affecting attention semantics (those layers mask out
older positions anyway).
kv_dims_keep selects which layers to trim and at what dim:
kv_dims_keep[i] = Some((dim, window)) trims layer i,
None leaves the layer untouched. Pass-through for layers
whose attention is full-causal.
Note: past_len is unchanged — the per-layer K/V buffers
just hold fewer real rows now; the decode flow’s per-layer
past_k_{i} input shape will see the trimmed length. Caller
is responsible for ensuring the graph’s declared past_seq
matches the trimmed length OR the trimmed layer is bound
dynamically.
Trait Implementations§
Source§impl Clone for LayerKvCache
impl Clone for LayerKvCache
Source§fn clone(&self) -> LayerKvCache
fn clone(&self) -> LayerKvCache
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for LayerKvCache
impl RefUnwindSafe for LayerKvCache
impl Send for LayerKvCache
impl Sync for LayerKvCache
impl Unpin for LayerKvCache
impl UnsafeUnpin for LayerKvCache
impl UnwindSafe for LayerKvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more