Expand description
Grouped Query Attention kernel.
Matches gqa-kernel-v1.yaml.
KV head broadcasting: kv_head = query_head / (num_heads / num_kv_heads)
Each function provides one of three backends:
fn gqa_scalar(...)– Pure Rust scalar reference (ground truth)unsafe fn gqa_avx2(...)– AVX2 SIMD implementationfn gqa_ptx() -> &'static str– PTX assembly source string
Functions§
- gqa_
avx2 ⚠ - AVX2 Grouped Query Attention – delegates to scalar.
- gqa_ptx
- PTX assembly for Grouped Query Attention.
- gqa_
scalar - Grouped Query Attention (scalar reference).