Skip to main content

attention

Function attention 

Source
pub fn attention(
    output: &mut [f32],
    q: &[f32],
    k_cache: &[f32],
    v_cache: &[f32],
    seq_len: usize,
    num_heads: usize,
    num_kv_heads: usize,
    head_dim: usize,
)
Expand description

Optimized grouped-query attention using NEON dot products.

Computes: for each head, score = softmax(Q·K^T / sqrt(d)), output = score·V