Enum MoeKind

Source

#[non_exhaustive]
#[repr(u16)]pub enum MoeKind {
    ScalarGguf = 0,
    Wmma = 1,
    WmmaGguf = 2,
}

Expand description

Mixture-of-Experts (MoE) variant selector — used as the op discriminant for kernel SKUs whose crate::OpCategory is crate::OpCategory::Moe. Phase 8 Milestone 8.5 wires the three fused per-token-dispatch + expert-matmul + accumulate kernels.

MoE forward pass shape:

Input activations [T, D_model].
Per-token top-k expert indices [T, top_k] (i32).
Per-token top-k expert weights [T, top_k] (FP).
Per-expert weight matrices [num_experts, D_model, D_expert] (dtype depends on the variant: FP for Wmma, GGUF-packed bytes for ScalarGguf / WmmaGguf).
Output [T, D_model] (after expert mixing).

All three variants are inference-only by convention; backward passes are not shipped (MoE training uses higher-level autograd surfaces that compose the per-expert FFN ops manually).

Lineage: vendored from attention.rs via fuel-cuda-kernels. See crates/baracuda-kernels-sys/LICENSE-thirdparty.md for the full attribution chain.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

§

ScalarGguf = 0

Scalar dispatch path operating on GGUF-quantized expert weights staged through a q8_1 intermediate (FP32 activations in, FP32 output out). No tensor cores. Used as a portability fallback and as the slower-but-simpler reference for the WMMA + GGUF hot path. Block formats: Q8_0, Q2_K, Q3_K, Q4_K, Q5_K, Q6_K (matches Fuel’s moe_gemm_gguf switch).

§

Wmma = 1

WMMA tensor-core path operating on dense FP expert weights (f16 / bf16). The FP MoE hot path used when full-precision expert weights are available — typically training-time or FP-deployment inference. sm_70+ required.

§

WmmaGguf = 2

Combined WMMA tensor-core + GGUF-quantized weight path. The dispatcher dequantizes one GGUF block per N-row into shared memory, then issues a 16×16×16 WMMA mma.sync against the dense activation tile. The production hot path for quantized LLM inference. Activation dtype: f16 / bf16. Weight block formats: same set as Self::ScalarGguf. sm_70+ required.

MoeKind

Enum MoeKind Copy item path

Variants (Non-exhaustive)§

ScalarGguf = 0

Wmma = 1

WmmaGguf = 2

Trait Implementations§

impl Clone for MoeKind

fn clone(&self) -> MoeKind

fn clone_from(&mut self, source: &Self)

impl Copy for MoeKind

impl Debug for MoeKind

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Eq for MoeKind

impl Hash for MoeKind

fn hash<__H>(&self, state: &mut __H)where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl PartialEq for MoeKind

fn eq(&self, other: &MoeKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for MoeKind

Auto Trait Implementations§

impl Freeze for MoeKind

impl RefUnwindSafe for MoeKind

impl Send for MoeKind

impl Sync for MoeKind

impl Unpin for MoeKind

impl UnsafeUnpin for MoeKind

impl UnwindSafe for MoeKind

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Enum MoeKind

fn hash<H>(&self, state: &mut H)
where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,