pub struct PagedKvCache { /* private fields */ }Expand description
A vLLM-style block-paged KV-cache. Instead of one contiguous pre-allocated
tensor (which may fragment or require realloc), this manages memory in
fixed-size 16-token blocks via a BlockTable.
Benefits:
- No single large allocation — blocks are page-sized
- Zero reallocation on append (new blocks allocated on demand from pool)
- Logical-to-physical mapping via block table
- Each block is independently cache-line friendly
Implementations§
Source§impl PagedKvCache
impl PagedKvCache
Sourcepub fn new(max_tokens: usize, dim: usize) -> Self
pub fn new(max_tokens: usize, dim: usize) -> Self
Create a paged KV-cache for max_tokens tokens of dimension dim.
Pre-allocates all blocks upfront to avoid any heap allocation during the inference loop. The number of blocks = ceil(max_tokens / 16).
Sourcepub fn max_tokens(&self) -> usize
pub fn max_tokens(&self) -> usize
Maximum tokens this cache can hold.
Sourcepub fn num_blocks(&self) -> usize
pub fn num_blocks(&self) -> usize
Number of blocks allocated.
Sourcepub fn blocks_in_use(&self) -> usize
pub fn blocks_in_use(&self) -> usize
Number of blocks currently in use (partially or fully).
Sourcepub fn append(&mut self, token: &[f64]) -> Result<(), RuntimeError>
pub fn append(&mut self, token: &[f64]) -> Result<(), RuntimeError>
Append a single token vector. Zero allocation — writes into the next available slot in the current block.
Sourcepub fn append_tensor(&mut self, t: &Tensor) -> Result<(), RuntimeError>
pub fn append_tensor(&mut self, t: &Tensor) -> Result<(), RuntimeError>
Append a batch of tokens from a 2D tensor [n, dim].
Sourcepub fn as_tensor(&self) -> Tensor
pub fn as_tensor(&self) -> Tensor
Materialize all stored tokens into a contiguous Tensor [current_len, dim].
This is a read operation that copies data from blocks into a flat buffer. The copy is required since blocks are non-contiguous.
Trait Implementations§
Source§impl Clone for PagedKvCache
impl Clone for PagedKvCache
Source§fn clone(&self) -> PagedKvCache
fn clone(&self) -> PagedKvCache
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for PagedKvCache
impl Debug for PagedKvCache
Auto Trait Implementations§
impl Freeze for PagedKvCache
impl RefUnwindSafe for PagedKvCache
impl Send for PagedKvCache
impl Sync for PagedKvCache
impl Unpin for PagedKvCache
impl UnsafeUnpin for PagedKvCache
impl UnwindSafe for PagedKvCache
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more