pub fn tmp_buffer_bytes(num_heads: u32, head_dim: u32) -> usizeExpand description
Compute the size in bytes of the temporary buffer needed for TQ SDPA.
Sized for max NWG=32 regardless of actual adaptive NWG — the buffer is allocated once at model load time and reused for all context lengths.