pub struct PrefixCache {
pub hits: u64,
pub misses: u64,
pub evictions: u64,
/* private fields */
}Expand description
Prefix KV-cache with trie-based lookup and LRU eviction.
The trie is keyed by complete blocks of block_size tokens. Each
internal node in the trie corresponds to one block boundary; leaf
nodes that carry a block_idx have a fully populated CacheBlock.
Fields§
§hits: u64Total cache hits since creation.
misses: u64Total cache misses since creation.
evictions: u64Total blocks evicted since creation.
Implementations§
Source§impl PrefixCache
impl PrefixCache
Sourcepub fn new(
max_blocks: usize,
block_size: usize,
num_layers: usize,
num_kv_heads: usize,
head_dim: usize,
) -> Self
pub fn new( max_blocks: usize, block_size: usize, num_layers: usize, num_kv_heads: usize, head_dim: usize, ) -> Self
Create a new, empty prefix cache.
Sourcepub fn lookup(&mut self, token_ids: &[u32]) -> (usize, Vec<&CacheBlock>)
pub fn lookup(&mut self, token_ids: &[u32]) -> (usize, Vec<&CacheBlock>)
Look up the longest cached prefix of token_ids.
Walks the trie block-by-block. For every complete block whose
tokens match and whose trie node carries a cached block, the
block’s last_used stamp is refreshed and the block is returned.
Returns (matched_len, Vec<&CacheBlock>).
Sourcepub fn insert(
&mut self,
token_ids: &[u32],
block_start: usize,
keys: Vec<Vec<f32>>,
values: Vec<Vec<f32>>,
) -> usize
pub fn insert( &mut self, token_ids: &[u32], block_start: usize, keys: Vec<Vec<f32>>, values: Vec<Vec<f32>>, ) -> usize
Insert a new block for token_ids[block_start .. block_start + block_size].
Evicts the LRU block if the cache is at capacity.
Returns the index of the inserted block in self.blocks.
Sourcepub fn release(&mut self, block_idx: usize)
pub fn release(&mut self, block_idx: usize)
Decrement the reference count of a block, making it eligible for eviction.
Sourcepub fn memory_bytes(&self) -> usize
pub fn memory_bytes(&self) -> usize
Total memory consumed by all live blocks’ KV tensors.
Sourcepub fn hit_rate(&self) -> f32
pub fn hit_rate(&self) -> f32
Cache hit rate in [0, 1]. Returns 0.0 if no lookups have been made.
Sourcepub fn block_size(&self) -> usize
pub fn block_size(&self) -> usize
Tokens per block.
Sourcepub fn get_block(&self, idx: usize) -> Option<&CacheBlock>
pub fn get_block(&self, idx: usize) -> Option<&CacheBlock>
Borrow a cached block by its index in the underlying arena.
Block indices are returned by CacheSession::block_indices when a
session is prepared via
PrefixAwarePrefill::prepare.
None means the index is out of bounds (e.g. the sentinel
usize::MAX placed for trie path failures).
Auto Trait Implementations§
impl Freeze for PrefixCache
impl RefUnwindSafe for PrefixCache
impl Send for PrefixCache
impl Sync for PrefixCache
impl Unpin for PrefixCache
impl UnsafeUnpin for PrefixCache
impl UnwindSafe for PrefixCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more