pub struct PrefixAwarePrefill {
pub cache: PrefixCache,
}Expand description
Wraps a PrefixCache and exposes a higher-level prefill API.
The typical call pattern for one request is:
let (session, uncached_start) = prefill.prepare(&token_ids);
// run your model prefill on token_ids[uncached_start..]
prefill.store_blocks(&token_ids, uncached_start, new_kv_blocks);
prefill.release_session(session);Fields§
§cache: PrefixCacheThe underlying prefix cache.
Implementations§
Source§impl PrefixAwarePrefill
impl PrefixAwarePrefill
Sourcepub fn new(cache: PrefixCache) -> Self
pub fn new(cache: PrefixCache) -> Self
Wrap an existing PrefixCache.
Sourcepub fn prepare(&mut self, token_ids: &[u32]) -> (CacheSession, usize)
pub fn prepare(&mut self, token_ids: &[u32]) -> (CacheSession, usize)
Determine how much of token_ids is already cached.
Returns (session, uncached_start) where uncached_start is the
index of the first token that must be processed by the model.
Sourcepub fn store_blocks(
&mut self,
token_ids: &[u32],
uncached_start: usize,
keys_by_block: Vec<KvBlockPair>,
)
pub fn store_blocks( &mut self, token_ids: &[u32], uncached_start: usize, keys_by_block: Vec<KvBlockPair>, )
After prefill, store the newly computed KV blocks back into the cache.
keys_by_block is a list of (keys, values) for each newly computed
block, in order, starting from the block at uncached_start.
Sourcepub fn release_session(&mut self, session: CacheSession)
pub fn release_session(&mut self, session: CacheSession)
Release all blocks held by a session (decrement their ref counts).
Sourcepub fn stats(&self) -> PrefixCacheStats
pub fn stats(&self) -> PrefixCacheStats
Snapshot of current cache statistics.
Auto Trait Implementations§
impl Freeze for PrefixAwarePrefill
impl RefUnwindSafe for PrefixAwarePrefill
impl Send for PrefixAwarePrefill
impl Sync for PrefixAwarePrefill
impl Unpin for PrefixAwarePrefill
impl UnsafeUnpin for PrefixAwarePrefill
impl UnwindSafe for PrefixAwarePrefill
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more