pub struct DispatchKey {
pub arch: SmArch,
pub dtype: DType,
pub head_dim: u32,
pub causal: bool,
pub varlen: bool,
pub sliding_window: Option<u32>,
pub alibi: bool,
pub sink: u32,
pub paged: bool,
pub gqa_ratio: u32,
}Expand description
Cell key for the FlashAttention dispatch table.
Every field directly affects the generated CUDA C++ template
instantiation — flipping any one of them changes the resulting
cubin. The table refuses to resolve unsupported combinations
(e.g. fp8 on Sm80, head_dim > 256).
Fields§
§arch: SmArchTarget SM architecture.
dtype: DTypeElement type for Q/K/V.
head_dim: u32Per-head dimension (D). Supported: 64, 80, 96, 128, 192, 256.
causal: boolCausal masking — autoregressive attention.
varlen: boolVariable-length (cu_seqlens). When false, batched attention with uniform seqlen.
sliding_window: Option<u32>Sliding-window size; None means full attention. Window size
is the number of past tokens each query attends to.
alibi: boolALiBi linear-position biases.
sink: u32Number of “sink” tokens (StreamingLLM); each query unconditionally
attends to the first sink keys regardless of sliding_window.
paged: boolvLLM-style paged KV-cache.
gqa_ratio: u32Q heads per KV head. 1 = MHA, >1 = GQA, equal to num_heads = MQA.
Implementations§
Source§impl DispatchKey
impl DispatchKey
Sourcepub fn validate_fwd(&self) -> Result<(), DispatchError>
pub fn validate_fwd(&self) -> Result<(), DispatchError>
Validate the cell for a forward path. Returns Err for
unreachable combinations.
Sourcepub fn validate_bwd(&self) -> Result<(), DispatchError>
pub fn validate_bwd(&self) -> Result<(), DispatchError>
Validate the cell for a backward path. Currently the same as forward, but kept distinct so we can refuse e.g. fp8 backward (numerically too lossy in the stock FA3) without affecting the forward whitelist.
Sourcepub fn validate_paged(&self) -> Result<(), DispatchError>
pub fn validate_paged(&self) -> Result<(), DispatchError>
Validate the cell for a paged forward path.
Sourcepub fn stable_hash(&self) -> u64
pub fn stable_hash(&self) -> u64
Stable 64-bit hash of the key. Useful as a cubin-cache index alongside the kernel-name string.
Sourcepub fn kernel_name(&self) -> String
pub fn kernel_name(&self) -> String
Build the canonical mangled kernel-name expression. Mirrors the
FA2/FA3 csrc naming convention so we can resolve it via NVRTC’s
nvrtcGetLoweredName.
Trait Implementations§
Source§impl Clone for DispatchKey
impl Clone for DispatchKey
Source§fn clone(&self) -> DispatchKey
fn clone(&self) -> DispatchKey
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for DispatchKey
impl Debug for DispatchKey
Source§impl Hash for DispatchKey
impl Hash for DispatchKey
Source§impl PartialEq for DispatchKey
impl PartialEq for DispatchKey
Source§fn eq(&self, other: &DispatchKey) -> bool
fn eq(&self, other: &DispatchKey) -> bool
self and other values to be equal, and is used by ==.impl Copy for DispatchKey
impl Eq for DispatchKey
impl StructuralPartialEq for DispatchKey
Auto Trait Implementations§
impl Freeze for DispatchKey
impl RefUnwindSafe for DispatchKey
impl Send for DispatchKey
impl Sync for DispatchKey
impl Unpin for DispatchKey
impl UnsafeUnpin for DispatchKey
impl UnwindSafe for DispatchKey
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.