Skip to main content

Module attention

Module attention 

Source
Expand description

Scaled dot-product attention kernel.

Matches attention-kernel-v1.yaml. Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

Each function provides one of three backends:

  • fn attention_scalar(...) – Pure Rust scalar reference (ground truth)
  • unsafe fn attention_avx2(...) – AVX2 SIMD implementation
  • fn attention_ptx() -> &'static str – PTX assembly source string

Functions§

attention_avx2
AVX2 scaled dot-product attention – delegates to scalar.
attention_ptx
PTX assembly for fused scaled dot-product attention.
attention_scalar
Scaled dot-product attention (scalar reference).