Function attention

Source

pub fn attention(
    output: &mut [f32],
    q: &[f32],
    k_cache: &[f32],
    v_cache: &[f32],
    seq_len: usize,
    num_heads: usize,
    num_kv_heads: usize,
    head_dim: usize,
)

Expand description

Optimized grouped-query attention using NEON dot products.

Computes: for each head, score = softmax(Q·K^T / sqrt(d)), output = score·V

attention

Function attention Copy item path

Function attention