baracuda-kernels-types 0.0.1-alpha.68

Shared type vocabulary for the baracuda ML kernel facade: Element / IntElement / FpElement / BiasElement trait hierarchy, layout / epilogue / activation tags, MatrixRef / TensorRef views, PlanPreference, PrecisionGuarantee, and Workspace. Lifted from baracuda-cutlass so that baracuda-kernels and the per-library wrapper crates can share one vocabulary.
Documentation

baracuda-kernels-types

Shared type vocabulary for the baracuda ML kernel facade.

This crate has no GPU code and no behavior of its own. It ships pure-data types — traits, enums, structs — that are the contracts the safe Rust layer (baracuda-kernels) and the raw FFI layer (baracuda-kernels-sys) both agree on. The same vocabulary is also consumed by every per-library wrapper crate (baracuda-cublas, baracuda-cudnn, baracuda-cusolver, …) so the whole workspace speaks one dtype + tag language instead of each crate re-declaring its own.

The types were previously defined in baracuda-cutlass::types; they were lifted out in Phase 0 so the facade could share them without forcing every consumer to pull in CUTLASS. baracuda-cutlass keeps the old names as re-exports for back-compat.

What's in here

src/
  element.rs    KernelDtype (umbrella marker) + Element / IntElement / FpElement /
                BinElement / BiasElement sibling traits
                + ElementKind / MathPrecision / BiasElementKind tag enums
                + scalar wrappers: S8, U8, S4, U4, Bin, F32Strict, Fp8E4M3, Fp8E5M2,
                                   Bool, Complex32, Complex64
                + ScalarType (alpha / beta projection)
  layout.rs     LayoutSku (Rcr, Rrr), ArchSku (Sm80, Sm89, Sm90a),
                EpilogueKind, ActivationKind
  matrix.rs     MatrixRef<T> + MatrixMut<T> + VectorRef<T> — 2-D matrix views for GEMM
  tensor.rs     TensorRef<T, N> + TensorMut<T, N> — rank-N tensor views for
                everything else (const-generic rank, element strides, broadcast convention)
  plan.rs       PlanPreference, PrecisionGuarantee, Workspace
  sku.rs        OpCategory, BackendKind, KernelSku — the structural identity tuple
                returned by every plan's sku() accessor
  ops.rs        Per-category op-discriminant enums: BinaryKind, UnaryKind,
                ReduceKind, ScanKind, SoftmaxKind, NormalizationKind, LossKind,
                AttentionKind, IndexingKind, EmbeddingKind, ShapeLayoutKind,
                SortKind, QuantizeKind, RandomKind, SegmentKind, ImageKind,
                FftKind, LinalgKind, MoeKind, GgufBlockFormat, … (and the
                supporting tag enums: PadMode, FillMode, LossReduction,
                CrossEntropyTargetKind, …)

Element vs KernelDtype — which to bound on

[KernelDtype] is the umbrella marker every kernel-usable dtype implements — including the sub-byte / FP8 / packed-bit newtypes (S4, U4, S8, U8, Fp8E4M3, Fp8E5M2, Bin) that have their own kernel families. Element, IntElement, FpElement, and BinElement all use KernelDtype as a supertrait, so a function bounded by <T: KernelDtype> accepts any kernel-usable type.

The op-shaped sub-traits are what plans actually parameterize on:

Bound Accepts When to use
<T: Element> f16, bf16, f32, F32Strict, f64, i32, i64, Bool, Complex32, Complex64 The elementwise / reduce / scan / norm / loss / shape-layout plans — they consume BinaryPlan<T, N> / UnaryPlan<T, N> shape with a type Scalar projection for α/β.
<T: IntElement> S8, U8, S4, U4 The int-GEMM plan family.
<T: FpElement> Fp8E4M3, Fp8E5M2 The FP8 GEMM plan family (sm_89+).
<T: BinElement> Bin The 1-bit binary GEMM plan family (XOR + popcount).
<T: KernelDtype> union of the four above Generic utility code that wants to accept any dtype — telemetry helpers, dtype-size queries, downstream framework wrappers.

KernelDtype::KIND: ElementKind is the single source of truth for the runtime dtype tag — pre-Phase-28 code that wrote <T as Element>::KIND should switch to plain T::KIND (works under any of the sub-trait bounds via supertrait inheritance) or <T as KernelDtype>::KIND for the fully-qualified form.

#[non_exhaustive] and forward-compat

Phase 28 marked the op-family discriminant enums plus several tag enums #[non_exhaustive]. Downstream code that matches on them must include a _ => catch-all — adding new variants then no longer breaks the build. The covered enums:

  • Op-family: BinaryKind, UnaryKind, TernaryKind, GatedActivationKind, PadMode, ShapeLayoutKind, ArgReduceKind, ReduceKind, SoftmaxKind, ScanKind, BinaryCmpKind, NormalizationKind, LossKind, RandomKind, LinalgKind, FftKind, ConvKind, PoolKind, AttentionKind, IndexingKind, SegmentKind, EmbeddingKind, QuantizeKind, GgufBlockFormat, MoeKind, SortKind, ImageKind.
  • Auxiliary tags: OpCategory, BackendKind, IndexElementKind, IndexOutputKind.

Intentionally LEFT exhaustive (deliberate breaking-change events on new variants):

  • ElementKind — every kernel dtype is enumerated; a new dtype is a workspace-wide event that should surface as a build break across every match.
  • LayoutSku, ArchSku, EpilogueKind, ActivationKind, BiasElementKind — these are the keys cutlass GEMM and int-GEMM dispatchers exhaustively match on to pick per-arch / per-fused-epilogue / per-bias-dtype kernel SKUs; adding a variant deserves to surface at every match site so each can wire or reject.
  • Workspace<'a> — hot-path-matched by every plan's run method; the None / Borrowed split has been stable through every alpha.
  • EmbeddingBagMode, FillMode, LossReduction, CrossEntropyTargetKind, BatchedOrmqrSide, BatchedOrmqrOp — closed mathematical / convention sets (Sum/Mean for the bag, Lower/Upper for triangular fill, the LAPACK Left/Right and N/T/C ops, the PyTorch reduction modes).

Phase 32 — descriptor builder retrofit

Phase 32 propagated the same #[non_exhaustive] marker to the descriptor structs that have been amended in recent phases (Conv{1,2,3}dDescriptor, ConvTranspose{1,2,3}dDescriptor, Pool{1,2,3}dDescriptor, AdaptivePool{1,2,3}dDescriptor, LpPool{1,2}dDescriptor, FractionalMaxPool{2,3}dDescriptor, InterpolateDescriptor, InterpolateBackwardDescriptor). These structs live in baracuda-kernels, not in this crate — see the "Phase 32 builder migration" section in baracuda-kernels/README.md for the full list of new ::new(...) constructors and .with_* setters.

The marker means downstream callers MUST use the new builder (Conv2dDescriptor::new(...).with_padding(...).with_stride(...)) instead of a struct literal. Adding optional fields in future phases then no longer breaks downstream builds.

Why this crate is split out

A few load-bearing reasons:

  1. Zero CUDA dependency at the type level. Downstream crates that only want the dtype vocabulary (e.g. a tensor library that needs Element::KIND to identify dtypes for printing) don't have to pull in CUDA, CUTLASS, or any *-sys crate. The runtime dependency surface is baracuda-driver (for DeviceSlice) + baracuda-types (for DeviceRepr / Half / BFloat16) + half + float8.
  2. One vocabulary per scalar dtype. Without this crate, each per-library wrapper would re-derive its own enum Dtype { F32, F16, … } from the underlying NVIDIA library's tag enum (cudaDataType_t, cudnnDataType_t, cufftType, …) and the safe facade would spend its life translating between them. Centralizing the vocabulary here means the translation table lives in one place per wrapper.
  3. KernelSku is the autotuner cache key. It needs to be Copy + Eq + Hash and stable across versions. Defining it here, away from any one library wrapper, keeps it neutral.

Dependencies

[dependencies]

baracuda-types  = "...features = [\"half-crate\", \"f8-crate\"]"

baracuda-driver = "..."

half            = "2"

float8          = "0.7"

baracuda-driver is the source of the lifetimed device-slice types (DeviceSlice / DeviceSliceMut) that back TensorRef / TensorMut. The half and float8 crates supply the precise IEEE half-precision and 8-bit-float wrappers (half::f16, half::bf16, float8::F8E4M3, float8::F8E5M2) that the FpElement impls re-export.

Usage

You generally don't depend on this crate directly — depend on baracuda-kernels (which re-exports the entire surface) and import from there:

use baracuda_kernels::{
    Element, ElementKind, KernelSku, PrecisionGuarantee,
    TensorRef, TensorMut, MatrixRef, MatrixMut,
    LayoutSku, EpilogueKind, ArchSku, PlanPreference, Workspace,
};

Depending on baracuda-kernels-types directly is only useful if you're writing a sibling wrapper crate that needs the vocabulary but doesn't want the rest of the facade (baracuda-cublas / baracuda-cudnn etc. do exactly this).

See also

  • baracuda-kernels — the safe facade that re-exports everything in this crate.
  • baracuda-kernels-sys — the raw FFI layer this vocabulary describes the contracts for.
  • ARCHITECTURE.md — the layered design, the Plan / Descriptor / Args triple, the KernelSku taxonomy, the workspace contract.