Expand description
KV cache GPU copy dispatch.
Copies new K or V data directly from a source GPU buffer into a pre-allocated KV cache buffer at the correct write position, with optional modulo wrapping for sliding window (ring buffer) caches.
This eliminates the CPU round-trip that append_bf16 requires:
instead of GPU -> CPU (as_slice) -> CPU (copy loop) -> shared buffer,
the GPU copies directly between two shared Metal buffers.
Statics§
- KV_
CACHE_ COPY_ SHADER_ SOURCE - MSL source for the KV cache copy kernel (embedded at compile time).
Functions§
- dispatch_
kv_ cache_ copy - Dispatch a GPU copy from a source bf16 buffer into a KV cache buffer.
- dispatch_
kv_ cache_ copy_ batch_ f32 - Dispatch a batched GPU copy from a source f32 buffer into a f32 KV cache.
- dispatch_
kv_ cache_ copy_ batch_ f32_ to_ f16 - Dispatch a batched F32→F16 copy from a source f32 buffer into an f16 KV cache.
- dispatch_
kv_ cache_ copy_ f32 - Dispatch a GPU copy from a source f32 buffer into a f32 KV cache buffer.
- dispatch_
kv_ cache_ copy_ seq_ f32 - Multi-position, all-heads KV cache copy (F32 → F32 cache, batched prefill).
- dispatch_
kv_ cache_ copy_ seq_ f32_ to_ f16 - Multi-position, all-heads KV cache copy (F32 source → F16 cache, batched prefill).
- register
- Register KV cache copy shader source with the given kernel registry.