pub enum FlashAttnPrefillLayout {
HeadMajor,
SeqMajor,
}Expand description
Layout selector for dispatch_flash_attn_prefill_bf16_d64.
The kernel reads from raw device pointers via integer strides, so any
element layout that keeps head_dim (D) as the contiguous innermost
axis is valid input. The two layouts named here both satisfy that
constraint and cover every BERT/embedding caller in hf2q today:
-
HeadMajor—[B, H, L, D], the same layout the D=256/D=512 dispatchers assume. Stride math:seq = D,head = L * D,batch = H * L * D. -
SeqMajor—[B, L, H, D], the natural output of BERT linear projections (hidden = H * Drow-major). Stride math:seq = H * D,head = D,batch = L * H * D. Choosing this layout avoids three host-side transpose dispatches per layer (Q + K + V) plus one for the output, which is the entire point of the D=64 dispatcher’s existence — the BERT family wins on dispatch count, not raw FA perf.
Variants§
HeadMajor
[B, H, L, D] — same as the D=256/D=512 dispatchers.
SeqMajor
[B, L, H, D] — natural BERT/embedding-model layout.
Trait Implementations§
Source§impl Clone for FlashAttnPrefillLayout
impl Clone for FlashAttnPrefillLayout
Source§fn clone(&self) -> FlashAttnPrefillLayout
fn clone(&self) -> FlashAttnPrefillLayout
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for FlashAttnPrefillLayout
impl Debug for FlashAttnPrefillLayout
Source§impl PartialEq for FlashAttnPrefillLayout
impl PartialEq for FlashAttnPrefillLayout
Source§fn eq(&self, other: &FlashAttnPrefillLayout) -> bool
fn eq(&self, other: &FlashAttnPrefillLayout) -> bool
self and other values to be equal, and is used by ==.