Skip to main content

PrefixCache

Struct PrefixCache 

Source
pub struct PrefixCache {
    pub hits: u64,
    pub misses: u64,
    pub evictions: u64,
    /* private fields */
}
Expand description

Prefix KV-cache with trie-based lookup and LRU eviction.

The trie is keyed by complete blocks of block_size tokens. Each internal node in the trie corresponds to one block boundary; leaf nodes that carry a block_idx have a fully populated CacheBlock.

Fields§

§hits: u64

Total cache hits since creation.

§misses: u64

Total cache misses since creation.

§evictions: u64

Total blocks evicted since creation.

Implementations§

Source§

impl PrefixCache

Source

pub fn new( max_blocks: usize, block_size: usize, num_layers: usize, num_kv_heads: usize, head_dim: usize, ) -> Self

Create a new, empty prefix cache.

Source

pub fn lookup(&mut self, token_ids: &[u32]) -> (usize, Vec<&CacheBlock>)

Look up the longest cached prefix of token_ids.

Walks the trie block-by-block. For every complete block whose tokens match and whose trie node carries a cached block, the block’s last_used stamp is refreshed and the block is returned.

Returns (matched_len, Vec<&CacheBlock>).

Source

pub fn insert( &mut self, token_ids: &[u32], block_start: usize, keys: Vec<Vec<f32>>, values: Vec<Vec<f32>>, ) -> usize

Insert a new block for token_ids[block_start .. block_start + block_size].

Evicts the LRU block if the cache is at capacity. Returns the index of the inserted block in self.blocks.

Source

pub fn release(&mut self, block_idx: usize)

Decrement the reference count of a block, making it eligible for eviction.

Source

pub fn len(&self) -> usize

Number of currently live (occupied) blocks.

Source

pub fn is_empty(&self) -> bool

Returns true if the cache contains no live blocks.

Source

pub fn capacity(&self) -> usize

Maximum number of blocks this cache can hold.

Source

pub fn memory_bytes(&self) -> usize

Total memory consumed by all live blocks’ KV tensors.

Source

pub fn hit_rate(&self) -> f32

Cache hit rate in [0, 1]. Returns 0.0 if no lookups have been made.

Source

pub fn block_size(&self) -> usize

Tokens per block.

Source

pub fn get_block(&self, idx: usize) -> Option<&CacheBlock>

Borrow a cached block by its index in the underlying arena.

Block indices are returned by CacheSession::block_indices when a session is prepared via PrefixAwarePrefill::prepare. None means the index is out of bounds (e.g. the sentinel usize::MAX placed for trie path failures).

Source

pub fn clear(&mut self)

Remove all cached blocks, resetting the trie to an empty root.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more