Skip to main content

Module kv_cache_copy

Module kv_cache_copy 

Source
Expand description

KV cache GPU copy dispatch.

Copies new K or V data directly from a source GPU buffer into a pre-allocated KV cache buffer at the correct write position, with optional modulo wrapping for sliding window (ring buffer) caches.

This eliminates the CPU round-trip that append_bf16 requires: instead of GPU -> CPU (as_slice) -> CPU (copy loop) -> shared buffer, the GPU copies directly between two shared Metal buffers.

Statics§

KV_CACHE_COPY_SHADER_SOURCE
MSL source for the KV cache copy kernel (embedded at compile time).

Functions§

dispatch_kv_cache_copy
Dispatch a GPU copy from a source bf16 buffer into a KV cache buffer.
dispatch_kv_cache_copy_batch_f32
Dispatch a batched GPU copy from a source f32 buffer into a f32 KV cache.
dispatch_kv_cache_copy_batch_f32_to_f16
Dispatch a batched F32→F16 copy from a source f32 buffer into an f16 KV cache.
dispatch_kv_cache_copy_f32
Dispatch a GPU copy from a source f32 buffer into a f32 KV cache buffer.
dispatch_kv_cache_copy_seq_f32
Multi-position, all-heads KV cache copy (F32 → F32 cache, batched prefill).
dispatch_kv_cache_copy_seq_f32_to_f16
Multi-position, all-heads KV cache copy (F32 source → F16 cache, batched prefill).
register
Register KV cache copy shader source with the given kernel registry.