1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
//! KV Cache operations traits
//!
//! Fused kernel operations for efficient KV cache management during inference.
use crateResult;
use Runtime;
use Tensor;
/// Fused KV cache update — writes new K and V tokens into caches in a single kernel.
///
/// Reduces kernel launches from 2 to 1 per layer.
///
/// # Layout contract
///
/// - `k_cache`, `v_cache`: `[B, num_kv_heads, max_seq_len, head_dim]` — preallocated cache
/// - `new_k`, `new_v`: `[B, num_kv_heads, new_len, head_dim]` — new tokens to insert
/// - `position`: starting write position in the sequence dimension
///
/// After this call, `cache[:, :, position:position+new_len, :] = new_kv`.