Expand description
Pre-allocated KV-cache scratchpad for zero-allocation transformer inference. KV-cache scratchpad – zero-allocation state persistence for transformer inference.
Provides Scratchpad, a pre-allocated linear buffer for appending
key/value token vectors without per-token heap allocation. The entire
[max_seq_len, dim] storage is allocated once at construction; subsequent
append calls copy data into existing storage.
§NoGC guarantee
After construction, append performs no heap allocation – it writes
directly into the pre-allocated Buffer. The as_tensor
method returns a zero-copy view via Rc clone of the underlying buffer.
§Relationship to PagedKvCache
Scratchpad uses a single contiguous buffer (simpler, better for small
sequences). PagedKvCache uses block
paging (better for large sequences where contiguous allocation may
fragment).
Structs§
- Scratchpad
- A pre-allocated scratch buffer for KV-cache. Allows appending new
key/value vectors without re-allocation, up to a fixed
max_seq_len.