Skip to main content

BackendKind

Enum BackendKind 

Source
#[non_exhaustive]
pub enum BackendKind {
Show 14 variants Bespoke, Cutlass, Cublas, Cudnn, Cufft, Cusparse, Cusolver, Curand, Cutensor, Npp, Cvcuda, FlashAttentionV2, FlashInfer, Ozaki { slices: u8, },
}
Expand description

Which underlying compute backend served a kernel SKU.

Surfaced through KernelSku::backend for telemetry, autotuner cache keys, and selector debugging.

#[non_exhaustive] — new backends (TensorRT, custom JIT-emitted kernels via baracuda-nvrtc, …) may land in future phases. Match arms must include a _ => catch-all.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

Bespoke

Hand-rolled kernel in baracuda-kernels-sys.

§

Cutlass

CUTLASS template instantiation in baracuda-cutlass-kernels-sys.

§

Cublas

baracuda-cublas wrapper of cuBLAS / cuBLASLt.

§

Cudnn

baracuda-cudnn wrapper of cuDNN (graph or legacy API).

§

Cufft

baracuda-cufft wrapper of cuFFT.

§

Cusparse

baracuda-cusparse wrapper of cuSPARSE / cuSPARSELt.

§

Cusolver

baracuda-cusolver wrapper of cuSOLVER.

§

Curand

baracuda-curand wrapper of cuRAND.

§

Cutensor

baracuda-cutensor wrapper of cuTENSOR.

§

Npp

baracuda-npp wrapper of NPP.

§

Cvcuda

baracuda-cvcuda wrapper of CV-CUDA.

§

FlashAttentionV2

Vendored Dao-AILab FlashAttention v2 (BSD-3-Clause). Phase 42 added this as a backend choice on FlashSdpaPlan for the long- context regime where FA2’s tiling wins over the bespoke kernel.

§

FlashInfer

Vendored FlashInfer (Apache-2.0). Phase 46 added three plan families backed by FlashInfer cherry-picked headers: BatchPagedDecodePlan (batched paged-KV decode for vLLM-style serving), TopKTopPSamplingPlan (sort-free combined top-K / top-P / min-P sampling), and CascadeAttentionPlan (LSE-merge for prefix-cache sharing across requests).

§

Ozaki

Vendored ozIMMU (MIT). Phase 44 backend choice on FP64 GemmPlan that splits each operand into slices int8 slices and runs slices² tensor-core matmuls (the Ozaki scheme) to synthesize a DGEMM on hardware that has no FP64 tensor cores (RTX 4070, L4, etc.). Opt-in — NOT bit-equivalent to native DGEMM; slices = 8 is the upstream-recommended sweet spot for well-conditioned inputs.

§Slice-count + variant discriminant encoding (Phase 44c)

The slices byte is split into two bit-fields:

  • Low 5 bits (slices & 0x1F) — slice count S:

    • 0 = auto (fp64_int8_auto, runtime selection based on mantissa-loss histogram).
    • 3..=18 = fixed slice count (fp64_int8_3 .. fp64_int8_18).
  • High 3 bits (slices >> 5) — Phase 44c variant flag:

    • 0 = Base (original ozIMMU; default for back-compat with Phase 44/44b callers).
    • 1 = EF (group-wise error-free summation; ~5–15% faster at the same accuracy).
    • 2 = RN (nearest-rounding split; ~2 extra effective bits per slice).
    • 3 = H (EF + RN combined).

Use the ozaki_slices helper constructors for ergonomic construction (ozaki_slices::ef(8)40 = EF variant at S=8). Values with any other bit pattern are rejected at plan-select time.

n-blocking (chunk large-N int8 GEMMs into 8192-wide pieces) is applied automatically by the C++ shim regardless of the variant flag.

Fields

§slices: u8

Slice count + variant discriminant — see the BackendKind::Ozaki doc-comment for the bit-field layout. Prefer the ozaki_slices helpers over raw integer construction.

Trait Implementations§

Source§

impl Clone for BackendKind

Source§

fn clone(&self) -> BackendKind

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for BackendKind

Source§

impl Debug for BackendKind

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Eq for BackendKind

Source§

impl Hash for BackendKind

Source§

fn hash<__H>(&self, state: &mut __H)
where __H: Hasher,

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for BackendKind

Source§

fn eq(&self, other: &BackendKind) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for BackendKind

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.