Skip to main content

Module gqa

Module gqa 

Source
Expand description

Grouped Query Attention kernel.

Matches gqa-kernel-v1.yaml. KV head broadcasting: kv_head = query_head / (num_heads / num_kv_heads)

Each function provides one of three backends:

  • fn gqa_scalar(...) – Pure Rust scalar reference (ground truth)
  • unsafe fn gqa_avx2(...) – AVX2 SIMD implementation
  • fn gqa_ptx() -> &'static str – PTX assembly source string

Functions§

gqa_avx2
AVX2 Grouped Query Attention – delegates to scalar.
gqa_ptx
PTX assembly for Grouped Query Attention.
gqa_scalar
Grouped Query Attention (scalar reference).