Skip to main content

Crate baracuda_kernels

Crate baracuda_kernels 

Source
Expand description

§baracuda-kernels

Unified ML op facade for the baracuda CUDA ecosystem.

Exposes every primitive an ML framework would expect (union of PyTorch torch.* + nn.functional and JAX lax.* / numpy ops) through a single Plan-based Rust surface, internally dispatching to:

  1. An NVIDIA-library wrapper crate when one already covers the op (baracuda-cublas, baracuda-cudnn, baracuda-cufft, baracuda-cusparse, baracuda-cusolver, baracuda-curand, baracuda-cutensor, baracuda-npp, baracuda-cvcuda, baracuda-cutlass).
  2. A bespoke .cu kernel shipped in baracuda-kernels-sys when no NVIDIA library covers it (or covers it poorly at relevant shapes).

Callers import one crate and reach for one API style; the dispatch decision is an internal detail driven by select.

§Status

Active. Covers ~2700 FFI launch points across Phase 1–66 work including: full elementwise unary/binary/ternary matrix (fwd + bwd, contig + strided), all standard reductions and scans, the normalizer family (RMS / Layer / Batch / Group / Instance with in-place SMEM-staged kernels for f32/f16/bf16/f64), softmax / log-softmax / sparsemax / gumbel-softmax (+ BW), full attention suite (SDPA contig + strided + BW, Flash SDPA sm_80 + sm_89 + varlen + Tri Dao FA2 v2.8.3, RoPE / ALiBi / KV-cache, paged-KV decode/prefill via FlashInfer, ring attention, block-sparse SDPA, arbitrary-mask SDPA), GEMM (f16/bf16/tf32/f32/f64/s8/u8/s4/u4/bin/ fp8 with optional bias + ReLU/GELU/SiLU epilogues), GGUF MMVQ (11 block formats × {contig, strided, batched, multi-M}), the complete loss family (15 losses × FW+BW + CTC), conv + pool (cuDNN-backed + bit-exact bespoke Adaptive / LpPool / FractionalMaxPool), image ops (interpolate / upsample / grid sample / ROI / NMS / pixel shuffle), linalg (cuSOLVER facade + bespoke batched Ormqr WY + QR materialize, real + complex), FFT / cuRAND facades, full quantize family + GGUF + NF4 + AWQ + Marlin + STE backward, segment + embedding + indexing + scatter, Mamba-2 SSD + causal conv1d, TransformerEngine FP8 cast / recipe, mHC hyper-connections.

Every public _run FFI symbol has a matching _can_implement pre-launch validator companion (Phase 66 closure, alpha.64).

Cargo features are documented in the workspace README.md. The default build (sm80 only) covers Ampere-baseline kernels; sm89 adds Ada specializations (FP8 GEMM, sm_89 Flash SDPA); sm90a reserves the Hopper namespace. Feature flags for the vendored kernel families (fa2, mhc, ozimmu, flashinfer, mamba, bnb_nf4, marlin, awq, xformers_*, tensor_engine, optim, ring_attention, megatron_tp, nvshmem) are off by default.

See ROADMAP.md for the live backlog and OP-MATRIX.md for per-op support status.

Re-exports§

pub use gemm::BinGemmArgs;
pub use gemm::BinGemmDescriptor;
pub use gemm::BinGemmPlan;
pub use gemm::DenseGemmArgs;
pub use gemm::DenseGemmDescriptor;
pub use gemm::DenseGemmLayout;
pub use gemm::DenseGemmPlan;
pub use gemm::Fp8GemmArgs;
pub use gemm::Fp8GemmDescriptor;
pub use gemm::Fp8GemmPlan;
pub use gemm::GemmSparse24Args;
pub use gemm::GemmSparse24Descriptor;
pub use gemm::GemmSparse24Plan;
pub use gemm::Int4GemmArgs;
pub use gemm::Int4GemmDescriptor;
pub use gemm::Int4GemmPlan;
pub use gemm::IntGemmPlan;
pub use gemm::gptq_to_marlin_repack;
pub use gemm::AwqActivation;
pub use gemm::GptqWeights;
pub use gemm::Int4AwqGemmArgs;
pub use gemm::Int4AwqGemmDescriptor;
pub use gemm::Int4AwqGemmPlan;
pub use gemm::Int4MarlinGemmArgs;
pub use gemm::Int4MarlinGemmDescriptor;
pub use gemm::Int4MarlinGemmPlan;
pub use gemm::MarlinActivation;
pub use gemm::MarlinWeights;
pub use gemm::MARLIN_PERM_LEN;
pub use gemm::MARLIN_SCALE_PERM_LEN;
pub use elementwise::AffineArgs;
pub use elementwise::AffineDescriptor;
pub use elementwise::AffinePlan;
pub use elementwise::BinaryArgs;
pub use elementwise::BinaryBackwardArgs;
pub use elementwise::BinaryBackwardDescriptor;
pub use elementwise::BinaryBackwardPlan;
pub use elementwise::BinaryCmpArgs;
pub use elementwise::BinaryCmpDescriptor;
pub use elementwise::BinaryCmpPlan;
pub use elementwise::BinaryDescriptor;
pub use elementwise::BinaryParamArgs;
pub use elementwise::BinaryParamBackwardArgs;
pub use elementwise::BinaryParamBackwardDescriptor;
pub use elementwise::BinaryParamBackwardPlan;
pub use elementwise::BinaryParamDescriptor;
pub use elementwise::BinaryParamPlan;
pub use elementwise::BinaryPlan;
pub use elementwise::CastArgs;
pub use elementwise::CastDescriptor;
pub use elementwise::CastPlan;
pub use elementwise::CastSubByteArgs;
pub use elementwise::CastSubByteDescriptor;
pub use elementwise::CastSubBytePlan;
pub use elementwise::GatedActivationArgs;
pub use elementwise::GatedActivationBackwardArgs;
pub use elementwise::GatedActivationBackwardDescriptor;
pub use elementwise::GatedActivationBackwardPlan;
pub use elementwise::GatedActivationDescriptor;
pub use elementwise::GatedActivationPlan;
pub use elementwise::TernaryArgs;
pub use elementwise::TernaryBackwardArgs;
pub use elementwise::TernaryBackwardDescriptor;
pub use elementwise::TernaryBackwardPlan;
pub use elementwise::TernaryDescriptor;
pub use elementwise::TernaryPlan;
pub use elementwise::UnaryArgs;
pub use elementwise::UnaryBackwardArgs;
pub use elementwise::UnaryBackwardDescriptor;
pub use elementwise::UnaryBackwardPlan;
pub use elementwise::UnaryDescriptor;
pub use elementwise::UnaryParamArgs;
pub use elementwise::UnaryParamBackwardArgs;
pub use elementwise::UnaryParamBackwardDescriptor;
pub use elementwise::UnaryParamBackwardPlan;
pub use elementwise::UnaryParamDescriptor;
pub use elementwise::UnaryParamPlan;
pub use elementwise::UnaryPlan;
pub use elementwise::WhereArgs;
pub use elementwise::WhereBackwardArgs;
pub use elementwise::WhereBackwardDescriptor;
pub use elementwise::WhereBackwardPlan;
pub use elementwise::WhereDescriptor;
pub use elementwise::WherePlan;
pub use elementwise::PReluArgs;
pub use elementwise::PReluBackwardArgs;
pub use elementwise::PReluBackwardDescriptor;
pub use elementwise::PReluBackwardPlan;
pub use elementwise::PReluDescriptor;
pub use elementwise::PReluPlan;
pub use shape_layout::ConcatArgs;
pub use shape_layout::ConcatBackwardArgs;
pub use shape_layout::ConcatBackwardDescriptor;
pub use shape_layout::ConcatBackwardPlan;
pub use shape_layout::ConcatDescriptor;
pub use shape_layout::ConcatPlan;
pub use shape_layout::ContiguizeArgs;
pub use shape_layout::ContiguizeDescriptor;
pub use shape_layout::ContiguizePlan;
pub use shape_layout::FillArgs;
pub use shape_layout::FillDescriptor;
pub use shape_layout::FillPlan;
pub use shape_layout::FlipArgs;
pub use shape_layout::FlipBackwardArgs;
pub use shape_layout::FlipBackwardDescriptor;
pub use shape_layout::FlipBackwardPlan;
pub use shape_layout::FlipDescriptor;
pub use shape_layout::FlipPlan;
pub use shape_layout::PadArgs;
pub use shape_layout::PadBackwardArgs;
pub use shape_layout::PadBackwardDescriptor;
pub use shape_layout::PadBackwardPlan;
pub use shape_layout::PadDescriptor;
pub use shape_layout::PadPlan;
pub use shape_layout::PermuteArgs;
pub use shape_layout::PermuteBackwardArgs;
pub use shape_layout::PermuteBackwardDescriptor;
pub use shape_layout::PermuteBackwardPlan;
pub use shape_layout::PermuteDescriptor;
pub use shape_layout::PermutePlan;
pub use shape_layout::RepeatArgs;
pub use shape_layout::RepeatBackwardArgs;
pub use shape_layout::RepeatBackwardDescriptor;
pub use shape_layout::RepeatBackwardPlan;
pub use shape_layout::RepeatDescriptor;
pub use shape_layout::RepeatPlan;
pub use shape_layout::RollArgs;
pub use shape_layout::RollBackwardArgs;
pub use shape_layout::RollBackwardDescriptor;
pub use shape_layout::RollBackwardPlan;
pub use shape_layout::RollDescriptor;
pub use shape_layout::RollPlan;
pub use shape_layout::TrilArgs;
pub use shape_layout::TrilBackwardArgs;
pub use shape_layout::TrilBackwardDescriptor;
pub use shape_layout::TrilBackwardPlan;
pub use shape_layout::TrilDescriptor;
pub use shape_layout::TrilPlan;
pub use shape_layout::TriuArgs;
pub use shape_layout::TriuBackwardArgs;
pub use shape_layout::TriuBackwardDescriptor;
pub use shape_layout::TriuBackwardPlan;
pub use shape_layout::TriuDescriptor;
pub use shape_layout::TriuPlan;
pub use shape_layout::WriteSliceArgs;
pub use shape_layout::WriteSliceDescriptor;
pub use shape_layout::WriteSlicePlan;
pub use reduce::ArgReduceArgs;
pub use reduce::ArgReduceDescriptor;
pub use reduce::ArgReducePlan;
pub use reduce::BoolReduceArgs;
pub use reduce::BoolReduceDescriptor;
pub use reduce::BoolReducePlan;
pub use reduce::CountReduceArgs;
pub use reduce::CountReduceDescriptor;
pub use reduce::CountReducePlan;
pub use reduce::ReduceArgs;
pub use reduce::ReduceBackwardArgs;
pub use reduce::ReduceBackwardDescriptor;
pub use reduce::ReduceBackwardPlan;
pub use reduce::ReduceDescriptor;
pub use reduce::ReducePlan;
pub use reduce::ReduceToArgs;
pub use reduce::ReduceToDescriptor;
pub use reduce::ReduceToPlan;
pub use reduce::TraceArgs;
pub use reduce::TraceDescriptor;
pub use reduce::TracePlan;
pub use scan::ScanArgs;
pub use scan::ScanBackwardArgs;
pub use scan::ScanBackwardDescriptor;
pub use scan::ScanBackwardPlan;
pub use scan::ScanDescriptor;
pub use scan::ScanPlan;
pub use softmax::GumbelSoftmaxArgs;
pub use softmax::GumbelSoftmaxBackwardArgs;
pub use softmax::GumbelSoftmaxBackwardDescriptor;
pub use softmax::GumbelSoftmaxBackwardPlan;
pub use softmax::GumbelSoftmaxDescriptor;
pub use softmax::GumbelSoftmaxPlan;
pub use softmax::SoftmaxArgs;
pub use softmax::SoftmaxBackwardArgs;
pub use softmax::SoftmaxBackwardDescriptor;
pub use softmax::SoftmaxBackwardPlan;
pub use softmax::SoftmaxDescriptor;
pub use softmax::SoftmaxPlan;
pub use softmax::SparsemaxArgs;
pub use softmax::SparsemaxBackwardArgs;
pub use softmax::SparsemaxBackwardDescriptor;
pub use softmax::SparsemaxBackwardPlan;
pub use softmax::SparsemaxDescriptor;
pub use softmax::SparsemaxPlan;
pub use softmax::SPARSEMAX_MAX_EXTENT;
pub use norm::BatchNormArgs;
pub use norm::BatchNormBackwardArgs;
pub use norm::BatchNormBackwardDescriptor;
pub use norm::BatchNormBackwardPlan;
pub use norm::BatchNormDescriptor;
pub use norm::BatchNormPlan;
pub use norm::GroupNormArgs;
pub use norm::GroupNormBackwardArgs;
pub use norm::GroupNormBackwardDescriptor;
pub use norm::GroupNormBackwardPlan;
pub use norm::GroupNormDescriptor;
pub use norm::GroupNormPlan;
pub use norm::InstanceNormArgs;
pub use norm::InstanceNormBackwardArgs;
pub use norm::InstanceNormBackwardDescriptor;
pub use norm::InstanceNormBackwardPlan;
pub use norm::InstanceNormDescriptor;
pub use norm::InstanceNormPlan;
pub use norm::LayerNormArgs;
pub use norm::LayerNormBackwardArgs;
pub use norm::LayerNormBackwardDescriptor;
pub use norm::LayerNormBackwardPlan;
pub use norm::LayerNormDescriptor;
pub use norm::LayerNormPlan;
pub use norm::RMSNormArgs;
pub use norm::RMSNormBackwardArgs;
pub use norm::RMSNormBackwardDescriptor;
pub use norm::RMSNormBackwardPlan;
pub use norm::RMSNormDescriptor;
pub use norm::RMSNormPlan;
pub use loss::BceLossArgs;
pub use loss::BceLossBackwardArgs;
pub use loss::BceLossBackwardDescriptor;
pub use loss::BceLossBackwardPlan;
pub use loss::BceLossDescriptor;
pub use loss::BceLossPlan;
pub use loss::BceWithLogitsLossArgs;
pub use loss::BceWithLogitsLossBackwardArgs;
pub use loss::BceWithLogitsLossBackwardDescriptor;
pub use loss::BceWithLogitsLossBackwardPlan;
pub use loss::BceWithLogitsLossDescriptor;
pub use loss::BceWithLogitsLossPlan;
pub use loss::CrossEntropyLossArgs;
pub use loss::CrossEntropyLossBackwardArgs;
pub use loss::CrossEntropyLossBackwardDescriptor;
pub use loss::CrossEntropyLossBackwardPlan;
pub use loss::CrossEntropyLossDescriptor;
pub use loss::CrossEntropyLossPlan;
pub use loss::FusedLinearCrossEntropyArgs;
pub use loss::FusedLinearCrossEntropyBackwardArgs;
pub use loss::FusedLinearCrossEntropyBackwardDescriptor;
pub use loss::FusedLinearCrossEntropyBackwardPlan;
pub use loss::FusedLinearCrossEntropyDescriptor;
pub use loss::FusedLinearCrossEntropyPlan;
pub use loss::FLCE_DEFAULT_IGNORE_INDEX;
pub use loss::GaussianNllLossArgs;
pub use loss::GaussianNllLossBackwardArgs;
pub use loss::GaussianNllLossBackwardDescriptor;
pub use loss::GaussianNllLossBackwardPlan;
pub use loss::GaussianNllLossDescriptor;
pub use loss::GaussianNllLossPlan;
pub use loss::HuberLossArgs;
pub use loss::HuberLossBackwardArgs;
pub use loss::HuberLossBackwardDescriptor;
pub use loss::HuberLossBackwardPlan;
pub use loss::HuberLossDescriptor;
pub use loss::HuberLossPlan;
pub use loss::KlDivLossArgs;
pub use loss::KlDivLossBackwardArgs;
pub use loss::KlDivLossBackwardDescriptor;
pub use loss::KlDivLossBackwardPlan;
pub use loss::KlDivLossDescriptor;
pub use loss::KlDivLossPlan;
pub use loss::L1LossArgs;
pub use loss::L1LossBackwardArgs;
pub use loss::L1LossBackwardDescriptor;
pub use loss::L1LossBackwardPlan;
pub use loss::L1LossDescriptor;
pub use loss::L1LossPlan;
pub use loss::MseLossArgs;
pub use loss::MseLossBackwardArgs;
pub use loss::MseLossBackwardDescriptor;
pub use loss::MseLossBackwardPlan;
pub use loss::MseLossDescriptor;
pub use loss::MseLossPlan;
pub use loss::NllLossArgs;
pub use loss::NllLossBackwardArgs;
pub use loss::NllLossBackwardDescriptor;
pub use loss::NllLossBackwardPlan;
pub use loss::NllLossDescriptor;
pub use loss::NllLossPlan;
pub use loss::PoissonNllLossArgs;
pub use loss::PoissonNllLossBackwardArgs;
pub use loss::PoissonNllLossBackwardDescriptor;
pub use loss::PoissonNllLossBackwardPlan;
pub use loss::PoissonNllLossDescriptor;
pub use loss::PoissonNllLossPlan;
pub use loss::SmoothL1LossArgs;
pub use loss::SmoothL1LossBackwardArgs;
pub use loss::SmoothL1LossBackwardDescriptor;
pub use loss::SmoothL1LossBackwardPlan;
pub use loss::SmoothL1LossDescriptor;
pub use loss::SmoothL1LossPlan;
pub use loss::CosineEmbeddingLossArgs;
pub use loss::CosineEmbeddingLossBackwardArgs;
pub use loss::CosineEmbeddingLossBackwardDescriptor;
pub use loss::CosineEmbeddingLossBackwardPlan;
pub use loss::CosineEmbeddingLossDescriptor;
pub use loss::CosineEmbeddingLossPlan;
pub use loss::HingeEmbeddingLossArgs;
pub use loss::HingeEmbeddingLossBackwardArgs;
pub use loss::HingeEmbeddingLossBackwardDescriptor;
pub use loss::HingeEmbeddingLossBackwardPlan;
pub use loss::HingeEmbeddingLossDescriptor;
pub use loss::HingeEmbeddingLossPlan;
pub use loss::MarginRankingLossArgs;
pub use loss::MarginRankingLossBackwardArgs;
pub use loss::MarginRankingLossBackwardDescriptor;
pub use loss::MarginRankingLossBackwardPlan;
pub use loss::MarginRankingLossDescriptor;
pub use loss::MarginRankingLossPlan;
pub use loss::MultiMarginLossArgs;
pub use loss::MultiMarginLossBackwardArgs;
pub use loss::MultiMarginLossBackwardDescriptor;
pub use loss::MultiMarginLossBackwardPlan;
pub use loss::MultiMarginLossDescriptor;
pub use loss::MultiMarginLossPlan;
pub use loss::MultilabelMarginLossArgs;
pub use loss::MultilabelMarginLossBackwardArgs;
pub use loss::MultilabelMarginLossBackwardDescriptor;
pub use loss::MultilabelMarginLossBackwardPlan;
pub use loss::MultilabelMarginLossDescriptor;
pub use loss::MultilabelMarginLossPlan;
pub use loss::MultilabelSoftMarginLossArgs;
pub use loss::MultilabelSoftMarginLossBackwardArgs;
pub use loss::MultilabelSoftMarginLossBackwardDescriptor;
pub use loss::MultilabelSoftMarginLossBackwardPlan;
pub use loss::MultilabelSoftMarginLossDescriptor;
pub use loss::MultilabelSoftMarginLossPlan;
pub use loss::TripletMarginLossArgs;
pub use loss::TripletMarginLossBackwardArgs;
pub use loss::TripletMarginLossBackwardDescriptor;
pub use loss::TripletMarginLossBackwardPlan;
pub use loss::TripletMarginLossDescriptor;
pub use loss::TripletMarginLossPlan;
pub use loss::CtcLossArgs;
pub use loss::CtcLossBackwardArgs;
pub use loss::CtcLossBackwardDescriptor;
pub use loss::CtcLossBackwardPlan;
pub use loss::CtcLossDescriptor;
pub use loss::CtcLossPlan;
pub use random::DropoutArgs;
pub use random::DropoutBackwardArgs;
pub use random::DropoutBackwardDescriptor;
pub use random::DropoutBackwardPlan;
pub use random::DropoutDescriptor;
pub use random::DropoutPlan;
pub use random::RandomArgs;
pub use random::RandomBoolArgs;
pub use random::RandomDescriptor;
pub use random::RandomPlan;
pub use attention::AlibiArgs;
pub use attention::AlibiBackwardArgs;
pub use attention::AlibiBackwardDescriptor;
pub use attention::AlibiBackwardPlan;
pub use attention::AlibiDescriptor;
pub use attention::AlibiPlan;
pub use attention::FlashDecodingArgs;
pub use attention::FlashDecodingDescriptor;
pub use attention::FlashDecodingPlan;
pub use attention::FLASH_DECODING_MAX_D;
pub use attention::FlashSdpaArgs;
pub use attention::FlashSdpaBackwardArgs;
pub use attention::FlashSdpaBackwardDescriptor;
pub use attention::FlashSdpaBackwardPlan;
pub use attention::FlashSdpaDescriptor;
pub use attention::FlashSdpaPlan;
pub use attention::FlashSdpaVarlenArgs;
pub use attention::FlashSdpaVarlenBackwardArgs;
pub use attention::FlashSdpaVarlenBackwardPlan;
pub use attention::FlashSdpaVarlenDescriptor;
pub use attention::FlashSdpaVarlenPlan;
pub use attention::HyperConnectionArgs;
pub use attention::HyperConnectionDescriptor;
pub use attention::HyperConnectionPlan;
pub use attention::KvCacheAppendArgs;
pub use attention::KvCacheAppendDescriptor;
pub use attention::KvCacheAppendPlan;
pub use attention::RopeArgs;
pub use attention::RopeBackwardArgs;
pub use attention::RopeBackwardDescriptor;
pub use attention::RopeBackwardPlan;
pub use attention::RopeDescriptor;
pub use attention::RopePlan;
pub use attention::SdpaArgs;
pub use attention::SdpaBackwardArgs;
pub use attention::SdpaBackwardDescriptor;
pub use attention::SdpaBackwardPlan;
pub use attention::SdpaBlockSparseArgs;
pub use attention::SdpaBlockSparseDescriptor;
pub use attention::SdpaBlockSparsePlan;
pub use attention::SdpaDescriptor;
pub use attention::SdpaPlan;
pub use attention::FLASH_SDPA_MAX_D;
pub use attention::ROPE_DEFAULT_BASE;
pub use attention::SDPA_BLOCK_SPARSE_MAX_BLOCK;
pub use attention::SDPA_BLOCK_SPARSE_MAX_D;
pub use attention::RopeScaledTableBuilder;
pub use attention::RopeScaling;
pub use linalg::BatchedOrmqrArgs;
pub use linalg::BatchedOrmqrDescriptor;
pub use linalg::BatchedOrmqrOp;
pub use linalg::BatchedOrmqrPlan;
pub use linalg::BatchedOrmqrSide;
pub use linalg::BatchedOrmqrWyArgs;
pub use linalg::BatchedOrmqrWyDescriptor;
pub use linalg::BatchedOrmqrWyPlan;
pub use linalg::BatchedQrArgs;
pub use linalg::BatchedQrDescriptor;
pub use linalg::BatchedQrMaterializeArgs;
pub use linalg::BatchedQrMaterializeDescriptor;
pub use linalg::BatchedQrMaterializePlan;
pub use linalg::BatchedQrPlan;
pub use linalg::BatchedSvdArgs;
pub use linalg::BatchedSvdDescriptor;
pub use linalg::BatchedSvdPlan;
pub use linalg::BatchedSvdaArgs;
pub use linalg::BatchedSvdaDescriptor;
pub use linalg::BatchedSvdaPlan;
pub use linalg::CholeskyArgs;
pub use linalg::CholeskyDescriptor;
pub use linalg::CholeskyPlan;
pub use linalg::EigArgs;
pub use linalg::EigDescriptor;
pub use linalg::EigPlan;
pub use linalg::EighArgs;
pub use linalg::EighDescriptor;
pub use linalg::EighPlan;
pub use linalg::InverseArgs;
pub use linalg::InverseDescriptor;
pub use linalg::InversePlan;
pub use linalg::LstSqArgs;
pub use linalg::LstSqDescriptor;
pub use linalg::LstSqPlan;
pub use linalg::LuArgs;
pub use linalg::LuDescriptor;
pub use linalg::LuPlan;
pub use linalg::QrArgs;
pub use linalg::QrDescriptor;
pub use linalg::QrPlan;
pub use linalg::SolveArgs;
pub use linalg::SolveDescriptor;
pub use linalg::SolvePlan;
pub use linalg::SvdArgs;
pub use linalg::SvdDescriptor;
pub use linalg::SvdPlan;
pub use linalg::WY_NB;
pub use fft::FftArgs;
pub use fft::FftDescriptor;
pub use fft::FftNdArgs;
pub use fft::FftNdDescriptor;
pub use fft::FftNdPlan;
pub use fft::FftPlan;
pub use fft::FftShiftArgs;
pub use fft::FftShiftDescriptor;
pub use fft::FftShiftNdArgs;
pub use fft::FftShiftNdDescriptor;
pub use fft::FftShiftNdPlan;
pub use fft::FftShiftPlan;
pub use fft::IrfftArgs;
pub use fft::IrfftDescriptor;
pub use fft::IrfftNdArgs;
pub use fft::IrfftNdDescriptor;
pub use fft::IrfftNdPlan;
pub use fft::IrfftPlan;
pub use fft::RfftArgs;
pub use fft::RfftDescriptor;
pub use fft::RfftNdArgs;
pub use fft::RfftNdDescriptor;
pub use fft::RfftNdPlan;
pub use fft::RfftPlan;
pub use fft::FFTSHIFT_ND_MAX_RANK;
pub use fft::FFTSHIFT_ND_MAX_SHIFT_AXES;
pub use indexing::GatherArgs;
pub use indexing::GatherBackwardArgs;
pub use indexing::GatherBackwardDescriptor;
pub use indexing::GatherBackwardPlan;
pub use indexing::GatherDescriptor;
pub use indexing::GatherPlan;
pub use indexing::IndexAddArgs;
pub use indexing::IndexAddDescriptor;
pub use indexing::IndexAddPlan;
pub use indexing::IndexSelectArgs;
pub use indexing::IndexSelectBackwardArgs;
pub use indexing::IndexSelectBackwardDescriptor;
pub use indexing::IndexSelectBackwardPlan;
pub use indexing::IndexSelectDescriptor;
pub use indexing::IndexSelectPlan;
pub use indexing::MaskedFillArgs;
pub use indexing::MaskedFillBackwardArgs;
pub use indexing::MaskedFillBackwardDescriptor;
pub use indexing::MaskedFillBackwardPlan;
pub use indexing::MaskedFillDescriptor;
pub use indexing::MaskedFillPlan;
pub use indexing::NonzeroArgs;
pub use indexing::NonzeroDescriptor;
pub use indexing::NonzeroPlan;
pub use indexing::OneHotArgs;
pub use indexing::OneHotDescriptor;
pub use indexing::OneHotPlan;
pub use indexing::ScatterArgs;
pub use indexing::ScatterDescriptor;
pub use indexing::ScatterPlan;
pub use indexing::ScatterAddArgs;
pub use indexing::ScatterAddDescriptor;
pub use indexing::ScatterAddPlan;
pub use embedding::EmbeddingArgs;
pub use embedding::EmbeddingBackwardArgs;
pub use embedding::EmbeddingBackwardDescriptor;
pub use embedding::EmbeddingBackwardPlan;
pub use embedding::EmbeddingBagArgs;
pub use embedding::EmbeddingBagBackwardArgs;
pub use embedding::EmbeddingBagBackwardDescriptor;
pub use embedding::EmbeddingBagBackwardPlan;
pub use embedding::EmbeddingBagDescriptor;
pub use embedding::EmbeddingBagMaxArgs;
pub use embedding::EmbeddingBagMaxBackwardArgs;
pub use embedding::EmbeddingBagMaxBackwardDescriptor;
pub use embedding::EmbeddingBagMaxBackwardPlan;
pub use embedding::EmbeddingBagMaxDescriptor;
pub use embedding::EmbeddingBagMaxPlan;
pub use embedding::EmbeddingBagMode;
pub use embedding::EmbeddingBagPlan;
pub use embedding::EmbeddingDescriptor;
pub use embedding::EmbeddingPlan;
pub use segment::SegmentMaxArgs;
pub use segment::SegmentMaxBackwardArgs;
pub use segment::SegmentMaxBackwardDescriptor;
pub use segment::SegmentMaxBackwardPlan;
pub use segment::SegmentMaxDescriptor;
pub use segment::SegmentMaxPlan;
pub use segment::SegmentMeanArgs;
pub use segment::SegmentMeanBackwardArgs;
pub use segment::SegmentMeanBackwardDescriptor;
pub use segment::SegmentMeanBackwardPlan;
pub use segment::SegmentMeanDescriptor;
pub use segment::SegmentMeanPlan;
pub use segment::SegmentMinArgs;
pub use segment::SegmentMinBackwardArgs;
pub use segment::SegmentMinBackwardDescriptor;
pub use segment::SegmentMinBackwardPlan;
pub use segment::SegmentMinDescriptor;
pub use segment::SegmentMinPlan;
pub use segment::SegmentProdArgs;
pub use segment::SegmentProdBackwardArgs;
pub use segment::SegmentProdBackwardDescriptor;
pub use segment::SegmentProdBackwardPlan;
pub use segment::SegmentProdDescriptor;
pub use segment::SegmentProdPlan;
pub use segment::SegmentSumArgs;
pub use segment::SegmentSumBackwardArgs;
pub use segment::SegmentSumBackwardDescriptor;
pub use segment::SegmentSumBackwardPlan;
pub use segment::SegmentSumDescriptor;
pub use segment::SegmentSumPlan;
pub use segment::UnsortedSegmentMaxArgs;
pub use segment::UnsortedSegmentMaxBackwardArgs;
pub use segment::UnsortedSegmentMaxBackwardDescriptor;
pub use segment::UnsortedSegmentMaxBackwardPlan;
pub use segment::UnsortedSegmentMaxDescriptor;
pub use segment::UnsortedSegmentMaxPlan;
pub use segment::UnsortedSegmentMeanArgs;
pub use segment::UnsortedSegmentMeanBackwardArgs;
pub use segment::UnsortedSegmentMeanBackwardDescriptor;
pub use segment::UnsortedSegmentMeanBackwardPlan;
pub use segment::UnsortedSegmentMeanDescriptor;
pub use segment::UnsortedSegmentMeanPlan;
pub use segment::UnsortedSegmentMinArgs;
pub use segment::UnsortedSegmentMinBackwardArgs;
pub use segment::UnsortedSegmentMinBackwardDescriptor;
pub use segment::UnsortedSegmentMinBackwardPlan;
pub use segment::UnsortedSegmentMinDescriptor;
pub use segment::UnsortedSegmentMinPlan;
pub use segment::UnsortedSegmentProdArgs;
pub use segment::UnsortedSegmentProdBackwardArgs;
pub use segment::UnsortedSegmentProdBackwardDescriptor;
pub use segment::UnsortedSegmentProdBackwardPlan;
pub use segment::UnsortedSegmentProdDescriptor;
pub use segment::UnsortedSegmentProdPlan;
pub use segment::UnsortedSegmentSumArgs;
pub use segment::UnsortedSegmentSumBackwardArgs;
pub use segment::UnsortedSegmentSumBackwardDescriptor;
pub use segment::UnsortedSegmentSumBackwardPlan;
pub use segment::UnsortedSegmentSumDescriptor;
pub use segment::UnsortedSegmentSumPlan;
pub use quantize::DequantizePerGroupArgs;
pub use quantize::DequantizePerGroupBackwardArgs;
pub use quantize::DequantizePerGroupBackwardDescriptor;
pub use quantize::DequantizePerGroupBackwardPlan;
pub use quantize::DequantizePerGroupDescriptor;
pub use quantize::DequantizePerGroupPlan;
pub use quantize::DequantizePerTokenArgs;
pub use quantize::DequantizePerTokenBackwardArgs;
pub use quantize::DequantizePerTokenBackwardDescriptor;
pub use quantize::DequantizePerTokenBackwardPlan;
pub use quantize::DequantizePerTokenDescriptor;
pub use quantize::DequantizePerTokenPlan;
pub use quantize::QuantizePerGroupArgs;
pub use quantize::QuantizePerGroupBackwardArgs;
pub use quantize::QuantizePerGroupBackwardDescriptor;
pub use quantize::QuantizePerGroupBackwardPlan;
pub use quantize::QuantizePerGroupDescriptor;
pub use quantize::QuantizePerGroupPlan;
pub use quantize::QuantizePerTokenArgs;
pub use quantize::QuantizePerTokenBackwardArgs;
pub use quantize::QuantizePerTokenBackwardDescriptor;
pub use quantize::QuantizePerTokenBackwardPlan;
pub use quantize::QuantizePerTokenDescriptor;
pub use quantize::QuantizePerTokenPlan;
pub use quantize::DequantizePerChannelArgs;
pub use quantize::DequantizePerChannelBackwardArgs;
pub use quantize::DequantizePerChannelBackwardDescriptor;
pub use quantize::DequantizePerChannelBackwardPlan;
pub use quantize::DequantizePerChannelDescriptor;
pub use quantize::DequantizePerChannelPlan;
pub use quantize::DequantizePerTensorArgs;
pub use quantize::DequantizePerTensorBackwardArgs;
pub use quantize::DequantizePerTensorBackwardDescriptor;
pub use quantize::DequantizePerTensorBackwardPlan;
pub use quantize::DequantizePerTensorDescriptor;
pub use quantize::DequantizePerTensorPlan;
pub use quantize::FakeQuantizeArgs;
pub use quantize::FakeQuantizeBackwardArgs;
pub use quantize::FakeQuantizeBackwardDescriptor;
pub use quantize::FakeQuantizeBackwardPlan;
pub use quantize::FakeQuantizeDescriptor;
pub use quantize::FakeQuantizePlan;
pub use quantize::QuantizePerChannelArgs;
pub use quantize::QuantizePerChannelBackwardArgs;
pub use quantize::QuantizePerChannelBackwardDescriptor;
pub use quantize::QuantizePerChannelBackwardPlan;
pub use quantize::QuantizePerChannelDescriptor;
pub use quantize::QuantizePerChannelPlan;
pub use quantize::QuantizePerTensorArgs;
pub use quantize::QuantizePerTensorBackwardArgs;
pub use quantize::QuantizePerTensorBackwardDescriptor;
pub use quantize::QuantizePerTensorBackwardPlan;
pub use quantize::QuantizePerTensorDescriptor;
pub use quantize::QuantizePerTensorPlan;
pub use quantize::DynamicRangeMode;
pub use quantize::DynamicRangeQuantizeArgs;
pub use quantize::DynamicRangeQuantizeDescriptor;
pub use quantize::DynamicRangeQuantizePlan;
pub use quantize::DynamicRangeScope;
pub use quantize::QuantizedLinearArgs;
pub use quantize::QuantizedLinearDescriptor;
pub use quantize::QuantizedLinearPlan;
pub use quantize::SmoothQuantLinearArgs;
pub use quantize::SmoothQuantLinearDescriptor;
pub use quantize::SmoothQuantLinearPlan;
pub use quantize::BlockQ2K;
pub use quantize::BlockQ3K;
pub use quantize::BlockQ4_0;
pub use quantize::BlockQ4_1;
pub use quantize::BlockQ4K;
pub use quantize::BlockQ5_0;
pub use quantize::BlockQ5_1;
pub use quantize::BlockQ5K;
pub use quantize::BlockQ6K;
pub use quantize::BlockQ8_0;
pub use quantize::BlockQ8K;
pub use quantize::GgufDequantizeArgs;
pub use quantize::GgufDequantizeDescriptor;
pub use quantize::GgufDequantizePlan;
pub use quantize::GgufMmvqArgs;
pub use quantize::GgufMmvqDescriptor;
pub use quantize::GgufMmvqPlan;
pub use quantize::GgufMmvqBatchedActivation;
pub use quantize::GgufMmvqBatchedArgs;
pub use quantize::GgufMmvqBatchedDescriptor;
pub use quantize::GgufMmvqBatchedFormat;
pub use quantize::GgufMmvqBatchedPlan;
pub use quantize::GgufMmvqMultiMArgs;
pub use quantize::GgufMmvqMultiMDescriptor;
pub use quantize::GgufMmvqMultiMPlan;
pub use quantize::Nf4Activation;
pub use quantize::Nf4DequantizeArgs;
pub use quantize::Nf4DequantizePlan;
pub use quantize::Nf4Descriptor;
pub use quantize::Nf4MmvqArgs;
pub use quantize::Nf4MmvqMultiMArgs;
pub use quantize::Nf4MmvqMultiMDescriptor;
pub use quantize::Nf4MmvqMultiMPlan;
pub use quantize::Nf4MmvqPlan;
pub use quantize::NF4_CODEBOOK;
pub use moe::MoeArgs;
pub use moe::MoeDescriptor;
pub use moe::MoePlan;
pub use moe::MoeVariant;
pub use image::AffineGridArgs;
pub use image::AffineGridDescriptor;
pub use image::AffineGridPlan;
pub use image::GridSampleArgs;
pub use image::GridSampleBackwardArgs;
pub use image::GridSampleBackwardDescriptor;
pub use image::GridSampleBackwardPlan;
pub use image::GridSampleDescriptor;
pub use image::GridSamplePlan;
pub use image::InterpolateArgs;
pub use image::InterpolateBackwardArgs;
pub use image::InterpolateBackwardDescriptor;
pub use image::InterpolateBackwardPlan;
pub use image::InterpolateDescriptor;
pub use image::InterpolateMode;
pub use image::InterpolatePlan;
pub use image::NmsArgs;
pub use image::NmsDescriptor;
pub use image::NmsPlan;
pub use image::PixelShuffleArgs;
pub use image::PixelShuffleDescriptor;
pub use image::PixelShufflePlan;
pub use image::PixelUnshuffleArgs;
pub use image::PixelUnshuffleDescriptor;
pub use image::PixelUnshufflePlan;
pub use image::RoiAlignArgs;
pub use image::RoiAlignBackwardArgs;
pub use image::RoiAlignBackwardDescriptor;
pub use image::RoiAlignBackwardPlan;
pub use image::RoiAlignDescriptor;
pub use image::RoiAlignPlan;
pub use image::RoiPoolArgs;
pub use image::RoiPoolBackwardArgs;
pub use image::RoiPoolBackwardDescriptor;
pub use image::RoiPoolBackwardPlan;
pub use image::RoiPoolDescriptor;
pub use image::RoiPoolPlan;
pub use sort::ArgsortArgs;
pub use sort::ArgsortDescriptor;
pub use sort::ArgsortPlan;
pub use sort::BincountArgs;
pub use sort::BincountDescriptor;
pub use sort::BincountPlan;
pub use sort::HistogramArgs;
pub use sort::HistogramDescriptor;
pub use sort::HistogramPlan;
pub use sort::HistogramddArgs;
pub use sort::HistogramddDescriptor;
pub use sort::HistogramddPlan;
pub use sort::KthvalueArgs;
pub use sort::KthvalueBackwardArgs;
pub use sort::KthvalueBackwardDescriptor;
pub use sort::KthvalueBackwardPlan;
pub use sort::KthvalueDescriptor;
pub use sort::KthvaluePlan;
pub use sort::MsortArgs;
pub use sort::MsortBackwardArgs;
pub use sort::MsortBackwardDescriptor;
pub use sort::MsortBackwardPlan;
pub use sort::MsortDescriptor;
pub use sort::MsortPlan;
pub use sort::SearchsortedArgs;
pub use sort::SearchsortedDescriptor;
pub use sort::SearchsortedPlan;
pub use sort::SortArgs;
pub use sort::SortBackwardArgs;
pub use sort::SortBackwardDescriptor;
pub use sort::SortBackwardPlan;
pub use sort::SortDescriptor;
pub use sort::SortPlan;
pub use sort::TopkArgs;
pub use sort::TopkBackwardArgs;
pub use sort::TopkBackwardDescriptor;
pub use sort::TopkBackwardPlan;
pub use sort::TopkDescriptor;
pub use sort::TopkPlan;
pub use sort::UniqueArgs;
pub use sort::UniqueConsecutiveArgs;
pub use sort::UniqueConsecutiveDescriptor;
pub use sort::UniqueConsecutivePlan;
pub use sort::UniqueDescriptor;
pub use sort::UniquePlan;
pub use sort::SORT_MAX_ROW;
pub use sort::TOPK_MAX_K;
pub use attention::RingAttentionArgs;
pub use attention::RingAttentionDescriptor;
pub use attention::RingAttentionPlan;
pub use attention::RING_ATTENTION_HEAD_DIM;
pub use attention::BatchPagedDecodeArgs;
pub use attention::BatchPagedDecodeDescriptor;
pub use attention::BatchPagedDecodePlan;
pub use attention::BatchPagedDecodeFp8Args;
pub use attention::BatchPagedDecodeFp8Descriptor;
pub use attention::BatchPagedDecodeFp8Plan;
pub use attention::BatchPagedPrefillArgs;
pub use attention::BatchPagedPrefillDescriptor;
pub use attention::BatchPagedPrefillPlan;
pub use attention::BatchRaggedPrefillArgs;
pub use attention::BatchRaggedPrefillDescriptor;
pub use attention::BatchRaggedPrefillPlan;
pub use attention::CascadeAttentionArgs;
pub use attention::CascadeAttentionDescriptor;
pub use attention::CascadeAttentionPlan;
pub use attention::CascadeMergeStatesArgs;
pub use attention::CascadeMergeStatesDescriptor;
pub use attention::CascadeMergeStatesPlan;
pub use attention::Fp8KvDtype;
pub use attention::PagedKvAppendArgs;
pub use attention::PagedKvAppendDescriptor;
pub use attention::PagedKvAppendPlan;
pub use attention::PagedKvCacheDescriptor;
pub use random::PerRowSampler;
pub use random::PerRowSamplingArgs;
pub use random::PerRowSamplingDescriptor;
pub use random::PerRowSamplingPlan;
pub use random::SamplerKind;
pub use random::SpeculativeSamplingArgs;
pub use random::SpeculativeSamplingDescriptor;
pub use random::SpeculativeSamplingPlan;
pub use random::TokenPenaltyArgs;
pub use random::TokenPenaltyDescriptor;
pub use random::TokenPenaltyPlan;
pub use random::TopKTopPSamplingArgs;
pub use random::TopKTopPSamplingDescriptor;
pub use random::TopKTopPSamplingPlan;

Modules§

attention
Attention op family — Phase 6 Category K.
elementwise
Elementwise op family — unified plan-based API.
embedding
Embedding op family — Category M.
fft
FFT op family — Milestone 6.4 (Category Fft).
gemm
GEMM family — unified plan-based API.
image
Image / spatial-transform op family — Category T.
indexing
Indexing / scatter / gather op family — Category L.
linalg
Dense linear algebra op family — Phase 6 (Category Linalg).
loss
Loss op family — Phase 5 Category R.
moe
Mixture-of-Experts (MoE) inference forward — Phase 8 Milestone 8.5 (Category V).
norm
Normalization op family — Phase 5 Category G.
quantize
Quantization op family — Category P.
random
Random / sampling op family — Phase 4.5 (Category Q).
reduce
Reduction op family — Phase 4 (Category E).
scan
Scan (associative prefix) op family — Phase 4 Category F.
segment
Segment / scatter-reduce op family — Category S.
shape_layout
Shape / layout op family — Category N from the comprehensive plan.
softmax
Softmax op family — Phase 5 Category H.
sort
Sorting / order-statistics op family — Category O.

Structs§

BatchedGemmArgs
Per-launch arguments for a BatchedGemmPlan::run call.
BatchedGemmDescriptor
Problem shape and configuration handed to BatchedGemmPlan::select.
BatchedGemmPlan
Plan for a uniform-shape batched GEMM launch.
Bin
1-bit binary element marker — packed-byte storage.
Bool
Boolean element marker. #[repr(transparent)] wrapper around u8 (1-byte storage).
Complex32
Single-precision complex element. #[repr(C)] struct of two f32 fields (real, imag) — ABI-compatible with cuFFT’s cufftComplex (which is itself an alias for CUDA’s float2), with NumPy’s complex64, and with PyTorch’s torch.complex64.
Complex64
Double-precision complex element. #[repr(C)] struct of two f64 fields — ABI-compatible with cuFFT’s cufftDoubleComplex, NumPy’s complex128, and PyTorch’s torch.complex128. Sibling to Complex32.
F32Strict
Strict-precision f32 element marker.
Fp8E4M3
8-bit floating-point, E4M3 encoding (1 sign + 4 exponent + 3 mantissa, exponent bias 7).
Fp8E5M2
8-bit floating-point, E5M2 encoding (1 sign + 5 exponent + 2 mantissa, exponent bias 15).
GemmArgs
Per-launch arguments for a GemmPlan::run call.
GemmDescriptor
Problem shape and configuration handed to GemmPlan::select.
GemmPlan
Selected GEMM kernel and the host-side metadata needed to launch it.
GemmSku
Identity of the kernel a plan picked.
GroupedGemmPlan
Plan for a grouped (per-problem variable shape) GEMM launch.
GroupedPlanPreference
Hints for GroupedGemmPlan::select.
GroupedProblem
One per-group entry for a grouped GEMM launch.
IntGemmArgs
Per-launch arguments for an IntGemmPlan::run call.
IntGemmDescriptor
Problem shape and configuration handed to IntGemmPlan::select.
KernelSku
Generalized kernel SKU — covers every op category.
MatrixMut
Mutable view of a device-resident matrix (used for the output D).
MatrixRef
Read-only view of a device-resident matrix.
PlanPreference
Hints that influence kernel selection inside a plan’s select method.
PrecisionGuarantee
Numerical guarantees a kernel provides.
PreparedGroupedGemm
A GroupedGemmPlan bound to a concrete set of per-group problems.
S4
Signed 4-bit integer element marker — packed-pair storage.
S8
Signed 8-bit integer element marker. #[repr(transparent)] around i8.
TensorMut
Mutable view of a device-resident rank-N tensor.
TensorRef
Read-only view of a device-resident rank-N tensor.
U4
Unsigned 4-bit integer element marker — packed-pair storage.
U8
Unsigned 8-bit integer element marker. #[repr(transparent)] around u8.
VectorRef
Read-only view of a device-resident vector.

Enums§

ActivationKind
Activation functions implemented by the Bias*Activation EpilogueKind variants. Surfaced for telemetry and selector logic; the kernel selection itself is driven by the enum variant.
ArchSku
Compute capability bucket the selected kernel was compiled for.
ArgReduceKind
Index-returning reduction discriminant — Phase 4 (ArgReducePlan).
AttentionKind
Attention-family op discriminant — Category K from the comprehensive plan.
BackendKind
Which underlying compute backend served a kernel SKU.
BiasElementKind
Runtime tag for a BiasElement.
BinaryCmpKind
Binary comparison op discriminant.
BinaryKind
Binary elementwise op discriminant.
CrossEntropyTargetKind
CrossEntropy target-tensor kind. Selects between PyTorch’s two target formats: class indices (i64[N]) and soft probabilities (T[N, C] — used for label smoothing / distillation).
ElementKind
Runtime tag for an Element or IntElement.
EmbeddingKind
Embedding-family op discriminant — Category M from the comprehensive plan.
EpilogueKind
Epilogue applied after the matrix-multiply accumulation.
Error
Errors raised by the safe CUTLASS wrapper.
FftKind
FFT-family op discriminant — Category U from the comprehensive plan.
FillMode
Fill-mode tag for triangular linalg ops (Cholesky / triangular solve).
GatedActivationKind
Gated-activation op discriminant (category C’).
GgufBlockFormat
GGUF block-format selector for QuantizeKind::GgufDequantize / QuantizeKind::GgufMmvq plans. Mirrors the discriminants used by llama.cpp / ggml so a descriptor can be round-tripped to a GGUF file header without translation.
GroupedScheduleMode
How CUTLASS schedules tiles across the grouped problem set.
ImageKind
Image / spatial-transform op discriminant — Category T from the comprehensive plan.
IndexElementKind
Runtime tag for an IndexElement. i32 is the legacy default; i64 was added in Phase 11.5 to match PyTorch’s int64 index convention without an extra cast pass.
IndexOutputKind
Runtime tag for an IndexOutputElement. i64 is the default (PyTorch convention) and the only variant prior to Phase 12.2; u32 and i32 were added so downstream frameworks that prefer narrower index dtypes (Fuel uses u32) can avoid a post-pass cast.
IndexingKind
Indexing / scatter / gather op discriminant — Category L from the comprehensive plan.
LayoutSku
Layout SKU. Describes the row/column orientation of A, B, C, and D for matrix-multiply-shaped kernels.
LinalgKind
Linear-algebra (dense) op discriminant — covers the cuSOLVER family shipped in Milestone 6.3.
LossKind
Loss op discriminant — category R from the comprehensive plan.
LossReduction
Loss reduction mode. Selects the output shape and the final scalar scaling for a LossKind plan. PyTorch’s reduction parameter.
MathPrecision
Math precision used by the FMA / tensor-core instruction.
MoeKind
Mixture-of-Experts (MoE) variant selector — used as the op discriminant for kernel SKUs whose crate::OpCategory is crate::OpCategory::Moe. Phase 8 Milestone 8.5 wires the three fused per-token-dispatch + expert-matmul + accumulate kernels.
NormalizationKind
Normalization op discriminant — category G from the comprehensive plan.
OpCategory
Op category — the top-level taxonomy a kernel SKU belongs to.
PadMode
Padding mode for crate::ops::ShapeLayoutKind::Pad.
PoolKind
Pooling-family op discriminant — Category J from the comprehensive plan.
QuantizeKind
Quantization op discriminant — Category P from the comprehensive plan.
RandomKind
Random / sampling op discriminant.
ReduceKind
Reduction op discriminant — Phase 4 (Category E).
ReduceToOp
Broadcast-reverse reduction op discriminant — ReduceToPlan.
ScanKind
Scan (associative prefix) op discriminant — category F from the comprehensive plan.
SegmentKind
Segment / scatter-reduce op discriminant — Category S from the comprehensive plan.
ShapeLayoutKind
Shape / layout op discriminant — Category N.
SoftmaxKind
Softmax-family op discriminant — category H from the comprehensive plan.
SortKind
Sorting / order-statistics op discriminant — Category O from the comprehensive plan (Phase 9).
TernaryKind
Ternary elementwise op discriminant.
UnaryKind
Unary elementwise op discriminant.
Workspace
Caller-supplied workspace for a launch.

Traits§

BiasElement
Bias element types accepted by the int-GEMM bias epilogue family.
BinElement
Binary (1-bit) element types supported by the kernel facade.
Element
Element types supported by the kernel facade.
FpElement
8-bit floating-point element types supported by the kernel facade.
IndexElement
Sealed marker trait for index-element types accepted by the indexing / embedding / segment kernel families.
IndexOutputElement
Sealed marker trait for the output index dtype produced by arg-reduction kernels (argmax / argmin axis ops).
IntElement
Integer element types supported by the int-GEMM kernel set.
KernelDtype
Umbrella marker trait for every dtype usable as a kernel input or output.
ScalarType
Sealed marker for the alpha/beta scalar type an Element uses.

Functions§

contiguous_stride
Compute the row-major contiguous stride for the given shape.

Type Aliases§

Result
Crate-local result alias.