Skip to main content

FlashDecodingDescriptor

Struct FlashDecodingDescriptor 

Source
pub struct FlashDecodingDescriptor {
    pub batch_size: i32,
    pub num_heads: i32,
    pub num_kv_heads: i32,
    pub k_len: i32,
    pub head_dim: i32,
    pub scale: f32,
    pub element: ElementKind,
}
Expand description

Descriptor for a FlashDecoding op.

num_kv_heads is the GQA grouping signal: when it equals num_heads the workload is full MHA; when it’s smaller (e.g. 8 for Llama 3 8B at H_q=32) every K/V head is shared by group_size = num_heads / num_kv_heads Q heads. The launcher uses group_size to pick between the warp-cooperative SIMT kernel (Tier-1) and the GQA-batched WMMA kernel (Tier-2, gated on group_size ≥ 4 + head_dim aligned to 16).

Fields§

§batch_size: i32

Batch size (B).

§num_heads: i32

Number of query / output heads (H_q).

§num_kv_heads: i32

Number of K/V heads (H_kv). Must divide num_heads evenly. num_kv_heads == num_heads → pure MHA. num_kv_heads == 1 → MQA. num_kv_heads < num_heads && > 1 → GQA.

§k_len: i32

K/V sequence length (the full attended prefix, not just the new step). Arbitrary; the split-K factor adapts via [CHUNK_K].

§head_dim: i32

Per-head feature dimension. d_q == d_k == d_v is enforced — the decode regime doesn’t justify the d_k != d_v complication the prefill kernel handles.

§scale: f32

Score scaling factor — typically 1.0 / sqrt(head_dim).

§element: ElementKind

Element type — must match the plan’s type parameter.

Implementations§

Source§

impl FlashDecodingDescriptor

Source

pub fn new( batch_size: i32, num_heads: i32, k_len: i32, head_dim: i32, element: ElementKind, ) -> Self

Convenience constructor for pure MHA (num_kv_heads == num_heads) with the standard 1/sqrt(D) scale.

Source

pub fn new_gqa( batch_size: i32, num_heads: i32, num_kv_heads: i32, k_len: i32, head_dim: i32, element: ElementKind, ) -> Self

Convenience constructor for GQA / MQA. num_kv_heads must divide num_heads.

Source

pub fn with_scale(self, scale: f32) -> Self

Builder: override the score scale (e.g. for QK-norm models that pre-divide by something other than sqrt(head_dim)).

Source

pub fn group_size(&self) -> i32

GQA group size — number of Q heads sharing each K/V head.

Trait Implementations§

Source§

impl Clone for FlashDecodingDescriptor

Source§

fn clone(&self) -> FlashDecodingDescriptor

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for FlashDecodingDescriptor

Source§

impl Debug for FlashDecodingDescriptor

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.