Skip to main content

KvCache

Struct KvCache 

Source
pub struct KvCache<B: Backend> {
    pub k: B::Buffer,
    pub v: B::Buffer,
    pub len: usize,
    pub capacity: usize,
    pub num_kv_heads: usize,
    pub head_dim: usize,
    pub block_size: usize,
    pub block_table: Option<B::Buffer>,
    pub context_lens: Option<B::Buffer>,
    pub paged_block_indices: Vec<u32>,
}
Expand description

Per-layer KV cache. Each model owns its own Vec<KvCache<B>> per sequence.

Two layouts are supported, selected at allocation time:

  1. Contiguous (default): k/v are [num_kv_heads, capacity, head_dim] f32 buffers. block_size == 0 and block_table / context_lens are None. Original ferrum layout — used when FERRUM_METAL_PAGED_KV is unset.
  2. Paged (vLLM-style): k/v are [num_blocks, num_kv_heads, block_size, head_dim] block pools. block_size > 0 and block_table (u32[max_num_blocks_per_seq]) + context_lens (u32[1] single-seq for now) are populated. Multi-seq sharing is a Phase 4 concern; today every paged cache_id has its own pool but the kernel-level indirection works.

Fields§

§k: B::Buffer§v: B::Buffer§len: usize§capacity: usize§num_kv_heads: usize§head_dim: usize§block_size: usize

Paged: KV positions per physical block. 0 ⇒ contiguous layout.

§block_table: Option<B::Buffer>

Paged: [max_num_blocks_per_seq] u32 — logical → physical block.

§context_lens: Option<B::Buffer>

Paged: [1] u32 — current context length for the kernel to read.

§paged_block_indices: Vec<u32>

Paged: host-side mirror of the physical block indices owned by this cache. Lets the model’s release path return blocks to the shared allocator without reading them back from device.

Auto Trait Implementations§

§

impl<B> Freeze for KvCache<B>
where <B as Backend>::Buffer: Freeze,

§

impl<B> RefUnwindSafe for KvCache<B>
where <B as Backend>::Buffer: RefUnwindSafe,

§

impl<B> Send for KvCache<B>

§

impl<B> Sync for KvCache<B>

§

impl<B> Unpin for KvCache<B>
where <B as Backend>::Buffer: Unpin,

§

impl<B> UnsafeUnpin for KvCache<B>
where <B as Backend>::Buffer: UnsafeUnpin,

§

impl<B> UnwindSafe for KvCache<B>
where <B as Backend>::Buffer: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V