pub struct CacheManager { /* private fields */ }Expand description
Manages a quantized KV-cache with configurable eviction.
Provides append, get, evict, and diagnostic methods such as
compression_ratio and memory_bytes.
Implementations§
Source§impl CacheManager
impl CacheManager
Sourcepub fn new(config: KVCacheConfig) -> Self
pub fn new(config: KVCacheConfig) -> Self
Create a new cache manager with the given configuration.
Sourcepub fn append(&mut self, key: &[f32], value: &[f32], _layer_idx: usize)
pub fn append(&mut self, key: &[f32], value: &[f32], _layer_idx: usize)
Append a new key-value pair to the cache.
key and value must each have length num_heads * head_dim.
_layer_idx is used by the PyramidKV eviction policy to determine
the per-layer budget.
Sourcepub fn get(&self, positions: &[usize]) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
pub fn get(&self, positions: &[usize]) -> (Vec<Vec<f32>>, Vec<Vec<f32>>)
Retrieve dequantized key-value pairs at the given logical positions.
Returns (keys, values) where each inner Vec<f32> has length
num_heads * head_dim.
Sourcepub fn evict(&mut self, budget: usize)
pub fn evict(&mut self, budget: usize)
Evict entries until the cache contains at most budget entries.
Sourcepub fn update_attention_scores(&mut self, scores: &[f64])
pub fn update_attention_scores(&mut self, scores: &[f64])
Update cumulative attention scores for the H2O eviction policy.
scores should have one value per current cache entry.
Sourcepub fn pyramid_budget(&self, layer_idx: usize, total_layers: usize) -> usize
pub fn pyramid_budget(&self, layer_idx: usize, total_layers: usize) -> usize
Compute the budget for a given layer under PyramidKV.
Lower layers get a proportionally larger share of max_seq_len.
Sourcepub fn compression_ratio(&self) -> f64
pub fn compression_ratio(&self) -> f64
Compression ratio: f32 bytes / quantized bytes for a single entry.
A 4-bit cache over f32 baseline yields roughly 8x compression (before accounting for scale/zero-point overhead).
Sourcepub fn memory_bytes(&self) -> usize
pub fn memory_bytes(&self) -> usize
Approximate total memory usage of the cache in bytes.