pub struct PrefixKvCache { /* private fields */ }Expand description
A radix-tree based prefix KV cache.
Stores KV cache states indexed by token prefix sequences. When a new prompt shares a prefix with a previously-cached sequence, the matching KV state is reused and only the remaining tokens need prefill.
Implementations§
Source§impl PrefixKvCache
impl PrefixKvCache
Sourcepub fn new(config: PrefixCacheConfig) -> Self
pub fn new(config: PrefixCacheConfig) -> Self
Create a new prefix KV cache with the given configuration.
Sourcepub fn lookup(&mut self, tokens: &[u32]) -> Option<(usize, &CachedKvState)>
pub fn lookup(&mut self, tokens: &[u32]) -> Option<(usize, &CachedKvState)>
Look up the longest matching prefix for the given tokens.
Returns (matching_prefix_length, cached_kv_state_ref). Returns None
if no prefix matches or the match is shorter than min_prefix_len.
Sourcepub fn store(
&mut self,
tokens: &[u32],
kv_cache: &dyn KvCacheAccess,
seq_len: usize,
kv_dim: usize,
num_layers: usize,
)
pub fn store( &mut self, tokens: &[u32], kv_cache: &dyn KvCacheAccess, seq_len: usize, kv_dim: usize, num_layers: usize, )
Store KV cache state for a token prefix.
Extracts the relevant KV data from the live cache via the
KvCacheAccess trait. If the prefix is shorter than
min_prefix_len, the store is silently skipped.
Sourcepub fn store_snapshot(&mut self, tokens: &[u32], snapshot: CachedKvState)
pub fn store_snapshot(&mut self, tokens: &[u32], snapshot: CachedKvState)
Store a pre-built CachedKvState directly for a token prefix.
This is useful when the caller has already constructed the snapshot.
Sourcepub fn restore(cached: &CachedKvState, target: &mut KvCache)
pub fn restore(cached: &CachedKvState, target: &mut KvCache)
Restore a cached prefix into a live KV cache.
Copies the cached KV data into the target cache’s buffers and resets the target’s sequence position to match the snapshot.
Sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Current estimated memory usage in bytes.
Auto Trait Implementations§
impl Freeze for PrefixKvCache
impl RefUnwindSafe for PrefixKvCache
impl Send for PrefixKvCache
impl Sync for PrefixKvCache
impl Unpin for PrefixKvCache
impl UnsafeUnpin for PrefixKvCache
impl UnwindSafe for PrefixKvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more