pub struct PagedSeqState<B: Backend> {
pub blocks: Vec<u32>,
pub block_table_buf: B::Buffer,
pub context_lens_buf: B::Buffer,
pub len: usize,
pub block_size: usize,
pub max_blocks_per_seq: usize,
}Expand description
Per-sequence paged-KV state.
Holds the logical→physical block mapping for ONE sequence (one
cache_id) plus its current token count. The mapping is stored as
both:
blocks: Vec<u32>— the host-side source of truth, used by the block allocator + grow logic.block_table_buf: B::Buffer— a device-side u32 buffer that mirrorsblocksand is read directly by the paged Metal kernels (PR #68 / #69). Kept in sync viaSelf::ensure_capacity.
context_lens_buf is a 1-element u32 device buffer holding len.
The kernel reads it each forward; we update it via B::write_u32.
Fields§
§blocks: Vec<u32>§block_table_buf: B::Buffer§context_lens_buf: B::Buffer§len: usize§block_size: usize§max_blocks_per_seq: usizeImplementations§
Source§impl<B: Backend> PagedSeqState<B>
impl<B: Backend> PagedSeqState<B>
Sourcepub fn new(block_size: usize, max_blocks_per_seq: usize) -> Self
pub fn new(block_size: usize, max_blocks_per_seq: usize) -> Self
Allocate buffers for a sequence that hasn’t yet allocated any
blocks. The allocator isn’t touched here — the first call to
Self::ensure_capacity does the real work.
Sourcepub fn ensure_capacity(
&mut self,
ctx: &mut B::Context,
alloc: &mut BlockAllocator,
target_len: usize,
) -> Result<()>
pub fn ensure_capacity( &mut self, ctx: &mut B::Context, alloc: &mut BlockAllocator, target_len: usize, ) -> Result<()>
Ensure the seq has enough blocks to hold target_len tokens.
Allocates additional blocks from the pool if needed and re-syncs
block_table_buf to the device. Idempotent if already big enough.
Sourcepub fn sync_context_len(&mut self, ctx: &mut B::Context)
pub fn sync_context_len(&mut self, ctx: &mut B::Context)
Update the on-device context_lens_buf to the current self.len.
Call this after Self::ensure_capacity but before dispatching
the paged attention kernel for this seq.
Sourcepub fn release(&mut self, alloc: &mut BlockAllocator)
pub fn release(&mut self, alloc: &mut BlockAllocator)
Release all blocks back to the allocator. Buffers are kept (cheap
to reuse for a future cache_id), but blocks become available for
other sequences. Sets len back to 0.
Auto Trait Implementations§
impl<B> Freeze for PagedSeqState<B>
impl<B> RefUnwindSafe for PagedSeqState<B>
impl<B> Send for PagedSeqState<B>
impl<B> Sync for PagedSeqState<B>
impl<B> Unpin for PagedSeqState<B>
impl<B> UnsafeUnpin for PagedSeqState<B>
impl<B> UnwindSafe for PagedSeqState<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more