Skip to main content

Module scratchpad

Module scratchpad 

Source
Expand description

Pre-allocated KV-cache scratchpad for zero-allocation transformer inference. KV-cache scratchpad – zero-allocation state persistence for transformer inference.

Provides Scratchpad, a pre-allocated linear buffer for appending key/value token vectors without per-token heap allocation. The entire [max_seq_len, dim] storage is allocated once at construction; subsequent append calls copy data into existing storage.

§NoGC guarantee

After construction, append performs no heap allocation – it writes directly into the pre-allocated Buffer. The as_tensor method returns a zero-copy view via Rc clone of the underlying buffer.

§Relationship to PagedKvCache

Scratchpad uses a single contiguous buffer (simpler, better for small sequences). PagedKvCache uses block paging (better for large sequences where contiguous allocation may fragment).

Structs§

Scratchpad
A pre-allocated scratch buffer for KV-cache. Allows appending new key/value vectors without re-allocation, up to a fixed max_seq_len.