pub fn attention(
output: &mut [f32],
q: &[f32],
k_cache: &[f32],
v_cache: &[f32],
seq_len: usize,
num_heads: usize,
num_kv_heads: usize,
head_dim: usize,
)Expand description
Optimized grouped-query attention using NEON dot products.
Computes: for each head, score = softmax(Q·K^T / sqrt(d)), output = score·V