Expand description
§baracuda-kernels
Unified ML op facade for the baracuda CUDA ecosystem.
Exposes every primitive an ML framework would expect (union of
PyTorch torch.* + nn.functional and JAX lax.* / numpy ops)
through a single Plan-based Rust surface, internally dispatching to:
- An NVIDIA-library wrapper crate when one already covers the op
(
baracuda-cublas,baracuda-cudnn,baracuda-cufft,baracuda-cusparse,baracuda-cusolver,baracuda-curand,baracuda-cutensor,baracuda-npp,baracuda-cvcuda,baracuda-cutlass). - A bespoke
.cukernel shipped inbaracuda-kernels-syswhen no NVIDIA library covers it (or covers it poorly at relevant shapes).
Callers import one crate and reach for one API style; the
dispatch decision is an internal detail driven by select.
§Status
Active. Covers ~2700 FFI launch points across Phase 1–66 work including: full elementwise unary/binary/ternary matrix (fwd + bwd, contig + strided), all standard reductions and scans, the normalizer family (RMS / Layer / Batch / Group / Instance with in-place SMEM-staged kernels for f32/f16/bf16/f64), softmax / log-softmax / sparsemax / gumbel-softmax (+ BW), full attention suite (SDPA contig + strided + BW, Flash SDPA sm_80 + sm_89 + varlen + Tri Dao FA2 v2.8.3, RoPE / ALiBi / KV-cache, paged-KV decode/prefill via FlashInfer, ring attention, block-sparse SDPA, arbitrary-mask SDPA), GEMM (f16/bf16/tf32/f32/f64/s8/u8/s4/u4/bin/ fp8 with optional bias + ReLU/GELU/SiLU epilogues), GGUF MMVQ (11 block formats × {contig, strided, batched, multi-M}), the complete loss family (15 losses × FW+BW + CTC), conv + pool (cuDNN-backed + bit-exact bespoke Adaptive / LpPool / FractionalMaxPool), image ops (interpolate / upsample / grid sample / ROI / NMS / pixel shuffle), linalg (cuSOLVER facade + bespoke batched Ormqr WY + QR materialize, real + complex), FFT / cuRAND facades, full quantize family + GGUF + NF4 + AWQ + Marlin + STE backward, segment + embedding + indexing + scatter, Mamba-2 SSD + causal conv1d, TransformerEngine FP8 cast / recipe, mHC hyper-connections.
Every public _run FFI symbol has a matching _can_implement
pre-launch validator companion (Phase 66 closure, alpha.64).
Cargo features are documented in the workspace README.md. The
default build (sm80 only) covers Ampere-baseline kernels;
sm89 adds Ada specializations (FP8 GEMM, sm_89 Flash SDPA);
sm90a reserves the Hopper namespace. Feature flags for the
vendored kernel families (fa2, mhc, ozimmu, flashinfer,
mamba, bnb_nf4, marlin, awq, xformers_*,
tensor_engine, optim, ring_attention, megatron_tp,
nvshmem) are off by default.
See ROADMAP.md for the live backlog and OP-MATRIX.md for
per-op support status.
Re-exports§
pub use gemm::BinGemmArgs;pub use gemm::BinGemmDescriptor;pub use gemm::BinGemmPlan;pub use gemm::DenseGemmArgs;pub use gemm::DenseGemmDescriptor;pub use gemm::DenseGemmLayout;pub use gemm::DenseGemmPlan;pub use gemm::Fp8GemmArgs;pub use gemm::Fp8GemmDescriptor;pub use gemm::Fp8GemmPlan;pub use gemm::GemmSparse24Args;pub use gemm::GemmSparse24Descriptor;pub use gemm::GemmSparse24Plan;pub use gemm::Int4GemmArgs;pub use gemm::Int4GemmDescriptor;pub use gemm::Int4GemmPlan;pub use gemm::IntGemmPlan;pub use gemm::gptq_to_marlin_repack;pub use gemm::AwqActivation;pub use gemm::GptqWeights;pub use gemm::Int4AwqGemmArgs;pub use gemm::Int4AwqGemmDescriptor;pub use gemm::Int4AwqGemmPlan;pub use gemm::Int4MarlinGemmArgs;pub use gemm::Int4MarlinGemmDescriptor;pub use gemm::Int4MarlinGemmPlan;pub use gemm::MarlinActivation;pub use gemm::MarlinWeights;pub use gemm::MARLIN_PERM_LEN;pub use gemm::MARLIN_SCALE_PERM_LEN;pub use elementwise::AffineArgs;pub use elementwise::AffineDescriptor;pub use elementwise::AffinePlan;pub use elementwise::BinaryArgs;pub use elementwise::BinaryBackwardArgs;pub use elementwise::BinaryBackwardDescriptor;pub use elementwise::BinaryBackwardPlan;pub use elementwise::BinaryCmpArgs;pub use elementwise::BinaryCmpDescriptor;pub use elementwise::BinaryCmpPlan;pub use elementwise::BinaryDescriptor;pub use elementwise::BinaryParamArgs;pub use elementwise::BinaryParamBackwardArgs;pub use elementwise::BinaryParamBackwardDescriptor;pub use elementwise::BinaryParamBackwardPlan;pub use elementwise::BinaryParamDescriptor;pub use elementwise::BinaryParamPlan;pub use elementwise::BinaryPlan;pub use elementwise::CastArgs;pub use elementwise::CastDescriptor;pub use elementwise::CastPlan;pub use elementwise::CastSubByteArgs;pub use elementwise::CastSubByteDescriptor;pub use elementwise::CastSubBytePlan;pub use elementwise::GatedActivationArgs;pub use elementwise::GatedActivationBackwardArgs;pub use elementwise::GatedActivationBackwardDescriptor;pub use elementwise::GatedActivationBackwardPlan;pub use elementwise::GatedActivationDescriptor;pub use elementwise::GatedActivationPlan;pub use elementwise::TernaryArgs;pub use elementwise::TernaryBackwardArgs;pub use elementwise::TernaryBackwardDescriptor;pub use elementwise::TernaryBackwardPlan;pub use elementwise::TernaryDescriptor;pub use elementwise::TernaryPlan;pub use elementwise::UnaryArgs;pub use elementwise::UnaryBackwardArgs;pub use elementwise::UnaryBackwardDescriptor;pub use elementwise::UnaryBackwardPlan;pub use elementwise::UnaryDescriptor;pub use elementwise::UnaryParamArgs;pub use elementwise::UnaryParamBackwardArgs;pub use elementwise::UnaryParamBackwardDescriptor;pub use elementwise::UnaryParamBackwardPlan;pub use elementwise::UnaryParamDescriptor;pub use elementwise::UnaryParamPlan;pub use elementwise::UnaryPlan;pub use elementwise::WhereArgs;pub use elementwise::WhereBackwardArgs;pub use elementwise::WhereBackwardDescriptor;pub use elementwise::WhereBackwardPlan;pub use elementwise::WhereDescriptor;pub use elementwise::WherePlan;pub use elementwise::PReluArgs;pub use elementwise::PReluBackwardArgs;pub use elementwise::PReluBackwardDescriptor;pub use elementwise::PReluBackwardPlan;pub use elementwise::PReluDescriptor;pub use elementwise::PReluPlan;pub use shape_layout::ConcatArgs;pub use shape_layout::ConcatBackwardArgs;pub use shape_layout::ConcatBackwardDescriptor;pub use shape_layout::ConcatBackwardPlan;pub use shape_layout::ConcatDescriptor;pub use shape_layout::ConcatPlan;pub use shape_layout::ContiguizeArgs;pub use shape_layout::ContiguizeDescriptor;pub use shape_layout::ContiguizePlan;pub use shape_layout::FillArgs;pub use shape_layout::FillDescriptor;pub use shape_layout::FillPlan;pub use shape_layout::FlipArgs;pub use shape_layout::FlipBackwardArgs;pub use shape_layout::FlipBackwardDescriptor;pub use shape_layout::FlipBackwardPlan;pub use shape_layout::FlipDescriptor;pub use shape_layout::FlipPlan;pub use shape_layout::PadArgs;pub use shape_layout::PadBackwardArgs;pub use shape_layout::PadBackwardDescriptor;pub use shape_layout::PadBackwardPlan;pub use shape_layout::PadDescriptor;pub use shape_layout::PadPlan;pub use shape_layout::PermuteArgs;pub use shape_layout::PermuteBackwardArgs;pub use shape_layout::PermuteBackwardDescriptor;pub use shape_layout::PermuteBackwardPlan;pub use shape_layout::PermuteDescriptor;pub use shape_layout::PermutePlan;pub use shape_layout::RepeatArgs;pub use shape_layout::RepeatBackwardArgs;pub use shape_layout::RepeatBackwardDescriptor;pub use shape_layout::RepeatBackwardPlan;pub use shape_layout::RepeatDescriptor;pub use shape_layout::RepeatPlan;pub use shape_layout::RollArgs;pub use shape_layout::RollBackwardArgs;pub use shape_layout::RollBackwardDescriptor;pub use shape_layout::RollBackwardPlan;pub use shape_layout::RollDescriptor;pub use shape_layout::RollPlan;pub use shape_layout::TrilArgs;pub use shape_layout::TrilBackwardArgs;pub use shape_layout::TrilBackwardDescriptor;pub use shape_layout::TrilBackwardPlan;pub use shape_layout::TrilDescriptor;pub use shape_layout::TrilPlan;pub use shape_layout::TriuArgs;pub use shape_layout::TriuBackwardArgs;pub use shape_layout::TriuBackwardDescriptor;pub use shape_layout::TriuBackwardPlan;pub use shape_layout::TriuDescriptor;pub use shape_layout::TriuPlan;pub use shape_layout::WriteSliceArgs;pub use shape_layout::WriteSliceDescriptor;pub use shape_layout::WriteSlicePlan;pub use reduce::ArgReduceArgs;pub use reduce::ArgReduceDescriptor;pub use reduce::ArgReducePlan;pub use reduce::BoolReduceArgs;pub use reduce::BoolReduceDescriptor;pub use reduce::BoolReducePlan;pub use reduce::CountReduceArgs;pub use reduce::CountReduceDescriptor;pub use reduce::CountReducePlan;pub use reduce::ReduceArgs;pub use reduce::ReduceBackwardArgs;pub use reduce::ReduceBackwardDescriptor;pub use reduce::ReduceBackwardPlan;pub use reduce::ReduceDescriptor;pub use reduce::ReducePlan;pub use reduce::ReduceToArgs;pub use reduce::ReduceToDescriptor;pub use reduce::ReduceToPlan;pub use reduce::TraceArgs;pub use reduce::TraceDescriptor;pub use reduce::TracePlan;pub use scan::ScanArgs;pub use scan::ScanBackwardArgs;pub use scan::ScanBackwardDescriptor;pub use scan::ScanBackwardPlan;pub use scan::ScanDescriptor;pub use scan::ScanPlan;pub use softmax::GumbelSoftmaxArgs;pub use softmax::GumbelSoftmaxBackwardArgs;pub use softmax::GumbelSoftmaxBackwardDescriptor;pub use softmax::GumbelSoftmaxBackwardPlan;pub use softmax::GumbelSoftmaxDescriptor;pub use softmax::GumbelSoftmaxPlan;pub use softmax::SoftmaxArgs;pub use softmax::SoftmaxBackwardArgs;pub use softmax::SoftmaxBackwardDescriptor;pub use softmax::SoftmaxBackwardPlan;pub use softmax::SoftmaxDescriptor;pub use softmax::SoftmaxPlan;pub use softmax::SparsemaxArgs;pub use softmax::SparsemaxBackwardArgs;pub use softmax::SparsemaxBackwardDescriptor;pub use softmax::SparsemaxBackwardPlan;pub use softmax::SparsemaxDescriptor;pub use softmax::SparsemaxPlan;pub use softmax::SPARSEMAX_MAX_EXTENT;pub use norm::BatchNormArgs;pub use norm::BatchNormBackwardArgs;pub use norm::BatchNormBackwardDescriptor;pub use norm::BatchNormBackwardPlan;pub use norm::BatchNormDescriptor;pub use norm::BatchNormPlan;pub use norm::GroupNormArgs;pub use norm::GroupNormBackwardArgs;pub use norm::GroupNormBackwardDescriptor;pub use norm::GroupNormBackwardPlan;pub use norm::GroupNormDescriptor;pub use norm::GroupNormPlan;pub use norm::InstanceNormArgs;pub use norm::InstanceNormBackwardArgs;pub use norm::InstanceNormBackwardDescriptor;pub use norm::InstanceNormBackwardPlan;pub use norm::InstanceNormDescriptor;pub use norm::InstanceNormPlan;pub use norm::LayerNormArgs;pub use norm::LayerNormBackwardArgs;pub use norm::LayerNormBackwardDescriptor;pub use norm::LayerNormBackwardPlan;pub use norm::LayerNormDescriptor;pub use norm::LayerNormPlan;pub use norm::RMSNormArgs;pub use norm::RMSNormBackwardArgs;pub use norm::RMSNormBackwardDescriptor;pub use norm::RMSNormBackwardPlan;pub use norm::RMSNormDescriptor;pub use norm::RMSNormPlan;pub use loss::BceLossArgs;pub use loss::BceLossBackwardArgs;pub use loss::BceLossBackwardDescriptor;pub use loss::BceLossBackwardPlan;pub use loss::BceLossDescriptor;pub use loss::BceLossPlan;pub use loss::BceWithLogitsLossArgs;pub use loss::BceWithLogitsLossBackwardArgs;pub use loss::BceWithLogitsLossBackwardDescriptor;pub use loss::BceWithLogitsLossBackwardPlan;pub use loss::BceWithLogitsLossDescriptor;pub use loss::BceWithLogitsLossPlan;pub use loss::CrossEntropyLossArgs;pub use loss::CrossEntropyLossBackwardArgs;pub use loss::CrossEntropyLossBackwardDescriptor;pub use loss::CrossEntropyLossBackwardPlan;pub use loss::CrossEntropyLossDescriptor;pub use loss::CrossEntropyLossPlan;pub use loss::FusedLinearCrossEntropyArgs;pub use loss::FusedLinearCrossEntropyBackwardArgs;pub use loss::FusedLinearCrossEntropyBackwardDescriptor;pub use loss::FusedLinearCrossEntropyBackwardPlan;pub use loss::FusedLinearCrossEntropyDescriptor;pub use loss::FusedLinearCrossEntropyPlan;pub use loss::FLCE_DEFAULT_IGNORE_INDEX;pub use loss::GaussianNllLossArgs;pub use loss::GaussianNllLossBackwardArgs;pub use loss::GaussianNllLossBackwardDescriptor;pub use loss::GaussianNllLossBackwardPlan;pub use loss::GaussianNllLossDescriptor;pub use loss::GaussianNllLossPlan;pub use loss::HuberLossArgs;pub use loss::HuberLossBackwardArgs;pub use loss::HuberLossBackwardDescriptor;pub use loss::HuberLossBackwardPlan;pub use loss::HuberLossDescriptor;pub use loss::HuberLossPlan;pub use loss::KlDivLossArgs;pub use loss::KlDivLossBackwardArgs;pub use loss::KlDivLossBackwardDescriptor;pub use loss::KlDivLossBackwardPlan;pub use loss::KlDivLossDescriptor;pub use loss::KlDivLossPlan;pub use loss::L1LossArgs;pub use loss::L1LossBackwardArgs;pub use loss::L1LossBackwardDescriptor;pub use loss::L1LossBackwardPlan;pub use loss::L1LossDescriptor;pub use loss::L1LossPlan;pub use loss::MseLossArgs;pub use loss::MseLossBackwardArgs;pub use loss::MseLossBackwardDescriptor;pub use loss::MseLossBackwardPlan;pub use loss::MseLossDescriptor;pub use loss::MseLossPlan;pub use loss::NllLossArgs;pub use loss::NllLossBackwardArgs;pub use loss::NllLossBackwardDescriptor;pub use loss::NllLossBackwardPlan;pub use loss::NllLossDescriptor;pub use loss::NllLossPlan;pub use loss::PoissonNllLossArgs;pub use loss::PoissonNllLossBackwardArgs;pub use loss::PoissonNllLossBackwardDescriptor;pub use loss::PoissonNllLossBackwardPlan;pub use loss::PoissonNllLossDescriptor;pub use loss::PoissonNllLossPlan;pub use loss::SmoothL1LossArgs;pub use loss::SmoothL1LossBackwardArgs;pub use loss::SmoothL1LossBackwardDescriptor;pub use loss::SmoothL1LossBackwardPlan;pub use loss::SmoothL1LossDescriptor;pub use loss::SmoothL1LossPlan;pub use loss::CosineEmbeddingLossArgs;pub use loss::CosineEmbeddingLossBackwardArgs;pub use loss::CosineEmbeddingLossBackwardDescriptor;pub use loss::CosineEmbeddingLossBackwardPlan;pub use loss::CosineEmbeddingLossDescriptor;pub use loss::CosineEmbeddingLossPlan;pub use loss::HingeEmbeddingLossArgs;pub use loss::HingeEmbeddingLossBackwardArgs;pub use loss::HingeEmbeddingLossBackwardDescriptor;pub use loss::HingeEmbeddingLossBackwardPlan;pub use loss::HingeEmbeddingLossDescriptor;pub use loss::HingeEmbeddingLossPlan;pub use loss::MarginRankingLossArgs;pub use loss::MarginRankingLossBackwardArgs;pub use loss::MarginRankingLossBackwardDescriptor;pub use loss::MarginRankingLossBackwardPlan;pub use loss::MarginRankingLossDescriptor;pub use loss::MarginRankingLossPlan;pub use loss::MultiMarginLossArgs;pub use loss::MultiMarginLossBackwardArgs;pub use loss::MultiMarginLossBackwardDescriptor;pub use loss::MultiMarginLossBackwardPlan;pub use loss::MultiMarginLossDescriptor;pub use loss::MultiMarginLossPlan;pub use loss::MultilabelMarginLossArgs;pub use loss::MultilabelMarginLossBackwardArgs;pub use loss::MultilabelMarginLossBackwardDescriptor;pub use loss::MultilabelMarginLossBackwardPlan;pub use loss::MultilabelMarginLossDescriptor;pub use loss::MultilabelMarginLossPlan;pub use loss::MultilabelSoftMarginLossArgs;pub use loss::MultilabelSoftMarginLossBackwardArgs;pub use loss::MultilabelSoftMarginLossBackwardDescriptor;pub use loss::MultilabelSoftMarginLossBackwardPlan;pub use loss::MultilabelSoftMarginLossDescriptor;pub use loss::MultilabelSoftMarginLossPlan;pub use loss::TripletMarginLossArgs;pub use loss::TripletMarginLossBackwardArgs;pub use loss::TripletMarginLossBackwardDescriptor;pub use loss::TripletMarginLossBackwardPlan;pub use loss::TripletMarginLossDescriptor;pub use loss::TripletMarginLossPlan;pub use loss::CtcLossArgs;pub use loss::CtcLossBackwardArgs;pub use loss::CtcLossBackwardDescriptor;pub use loss::CtcLossBackwardPlan;pub use loss::CtcLossDescriptor;pub use loss::CtcLossPlan;pub use random::DropoutArgs;pub use random::DropoutBackwardArgs;pub use random::DropoutBackwardDescriptor;pub use random::DropoutBackwardPlan;pub use random::DropoutDescriptor;pub use random::DropoutPlan;pub use random::RandomArgs;pub use random::RandomBoolArgs;pub use random::RandomDescriptor;pub use random::RandomPlan;pub use attention::AlibiArgs;pub use attention::AlibiBackwardArgs;pub use attention::AlibiBackwardDescriptor;pub use attention::AlibiBackwardPlan;pub use attention::AlibiDescriptor;pub use attention::AlibiPlan;pub use attention::FlashDecodingArgs;pub use attention::FlashDecodingDescriptor;pub use attention::FlashDecodingPlan;pub use attention::FLASH_DECODING_MAX_D;pub use attention::FlashSdpaArgs;pub use attention::FlashSdpaBackwardArgs;pub use attention::FlashSdpaBackwardDescriptor;pub use attention::FlashSdpaBackwardPlan;pub use attention::FlashSdpaDescriptor;pub use attention::FlashSdpaPlan;pub use attention::FlashSdpaVarlenArgs;pub use attention::FlashSdpaVarlenBackwardArgs;pub use attention::FlashSdpaVarlenBackwardPlan;pub use attention::FlashSdpaVarlenDescriptor;pub use attention::FlashSdpaVarlenPlan;pub use attention::HyperConnectionArgs;pub use attention::HyperConnectionDescriptor;pub use attention::HyperConnectionPlan;pub use attention::KvCacheAppendArgs;pub use attention::KvCacheAppendDescriptor;pub use attention::KvCacheAppendPlan;pub use attention::RopeArgs;pub use attention::RopeBackwardArgs;pub use attention::RopeBackwardDescriptor;pub use attention::RopeBackwardPlan;pub use attention::RopeDescriptor;pub use attention::RopePlan;pub use attention::SdpaArgs;pub use attention::SdpaBackwardArgs;pub use attention::SdpaBackwardDescriptor;pub use attention::SdpaBackwardPlan;pub use attention::SdpaBlockSparseArgs;pub use attention::SdpaBlockSparseDescriptor;pub use attention::SdpaBlockSparsePlan;pub use attention::SdpaDescriptor;pub use attention::SdpaPlan;pub use attention::FLASH_SDPA_MAX_D;pub use attention::ROPE_DEFAULT_BASE;pub use attention::SDPA_BLOCK_SPARSE_MAX_BLOCK;pub use attention::SDPA_BLOCK_SPARSE_MAX_D;pub use attention::RopeScaledTableBuilder;pub use attention::RopeScaling;pub use linalg::BatchedOrmqrArgs;pub use linalg::BatchedOrmqrDescriptor;pub use linalg::BatchedOrmqrOp;pub use linalg::BatchedOrmqrPlan;pub use linalg::BatchedOrmqrSide;pub use linalg::BatchedOrmqrWyArgs;pub use linalg::BatchedOrmqrWyDescriptor;pub use linalg::BatchedOrmqrWyPlan;pub use linalg::BatchedQrArgs;pub use linalg::BatchedQrDescriptor;pub use linalg::BatchedQrMaterializeArgs;pub use linalg::BatchedQrMaterializeDescriptor;pub use linalg::BatchedQrMaterializePlan;pub use linalg::BatchedQrPlan;pub use linalg::BatchedSvdArgs;pub use linalg::BatchedSvdDescriptor;pub use linalg::BatchedSvdPlan;pub use linalg::BatchedSvdaArgs;pub use linalg::BatchedSvdaDescriptor;pub use linalg::BatchedSvdaPlan;pub use linalg::CholeskyArgs;pub use linalg::CholeskyDescriptor;pub use linalg::CholeskyPlan;pub use linalg::EigArgs;pub use linalg::EigDescriptor;pub use linalg::EigPlan;pub use linalg::EighArgs;pub use linalg::EighDescriptor;pub use linalg::EighPlan;pub use linalg::InverseArgs;pub use linalg::InverseDescriptor;pub use linalg::InversePlan;pub use linalg::LstSqArgs;pub use linalg::LstSqDescriptor;pub use linalg::LstSqPlan;pub use linalg::LuArgs;pub use linalg::LuDescriptor;pub use linalg::LuPlan;pub use linalg::QrArgs;pub use linalg::QrDescriptor;pub use linalg::QrPlan;pub use linalg::SolveArgs;pub use linalg::SolveDescriptor;pub use linalg::SolvePlan;pub use linalg::SvdArgs;pub use linalg::SvdDescriptor;pub use linalg::SvdPlan;pub use linalg::WY_NB;pub use fft::FftArgs;pub use fft::FftDescriptor;pub use fft::FftNdArgs;pub use fft::FftNdDescriptor;pub use fft::FftNdPlan;pub use fft::FftPlan;pub use fft::FftShiftArgs;pub use fft::FftShiftDescriptor;pub use fft::FftShiftNdArgs;pub use fft::FftShiftNdDescriptor;pub use fft::FftShiftNdPlan;pub use fft::FftShiftPlan;pub use fft::IrfftArgs;pub use fft::IrfftDescriptor;pub use fft::IrfftNdArgs;pub use fft::IrfftNdDescriptor;pub use fft::IrfftNdPlan;pub use fft::IrfftPlan;pub use fft::RfftArgs;pub use fft::RfftDescriptor;pub use fft::RfftNdArgs;pub use fft::RfftNdDescriptor;pub use fft::RfftNdPlan;pub use fft::RfftPlan;pub use fft::FFTSHIFT_ND_MAX_RANK;pub use fft::FFTSHIFT_ND_MAX_SHIFT_AXES;pub use indexing::GatherArgs;pub use indexing::GatherBackwardArgs;pub use indexing::GatherBackwardDescriptor;pub use indexing::GatherBackwardPlan;pub use indexing::GatherDescriptor;pub use indexing::GatherPlan;pub use indexing::IndexAddArgs;pub use indexing::IndexAddDescriptor;pub use indexing::IndexAddPlan;pub use indexing::IndexSelectArgs;pub use indexing::IndexSelectBackwardArgs;pub use indexing::IndexSelectBackwardDescriptor;pub use indexing::IndexSelectBackwardPlan;pub use indexing::IndexSelectDescriptor;pub use indexing::IndexSelectPlan;pub use indexing::MaskedFillArgs;pub use indexing::MaskedFillBackwardArgs;pub use indexing::MaskedFillBackwardDescriptor;pub use indexing::MaskedFillBackwardPlan;pub use indexing::MaskedFillDescriptor;pub use indexing::MaskedFillPlan;pub use indexing::NonzeroArgs;pub use indexing::NonzeroDescriptor;pub use indexing::NonzeroPlan;pub use indexing::OneHotArgs;pub use indexing::OneHotDescriptor;pub use indexing::OneHotPlan;pub use indexing::ScatterArgs;pub use indexing::ScatterDescriptor;pub use indexing::ScatterPlan;pub use indexing::ScatterAddArgs;pub use indexing::ScatterAddDescriptor;pub use indexing::ScatterAddPlan;pub use embedding::EmbeddingArgs;pub use embedding::EmbeddingBackwardArgs;pub use embedding::EmbeddingBackwardDescriptor;pub use embedding::EmbeddingBackwardPlan;pub use embedding::EmbeddingBagArgs;pub use embedding::EmbeddingBagBackwardArgs;pub use embedding::EmbeddingBagBackwardDescriptor;pub use embedding::EmbeddingBagBackwardPlan;pub use embedding::EmbeddingBagDescriptor;pub use embedding::EmbeddingBagMaxArgs;pub use embedding::EmbeddingBagMaxBackwardArgs;pub use embedding::EmbeddingBagMaxBackwardDescriptor;pub use embedding::EmbeddingBagMaxBackwardPlan;pub use embedding::EmbeddingBagMaxDescriptor;pub use embedding::EmbeddingBagMaxPlan;pub use embedding::EmbeddingBagMode;pub use embedding::EmbeddingBagPlan;pub use embedding::EmbeddingDescriptor;pub use embedding::EmbeddingPlan;pub use segment::SegmentMaxArgs;pub use segment::SegmentMaxBackwardArgs;pub use segment::SegmentMaxBackwardDescriptor;pub use segment::SegmentMaxBackwardPlan;pub use segment::SegmentMaxDescriptor;pub use segment::SegmentMaxPlan;pub use segment::SegmentMeanArgs;pub use segment::SegmentMeanBackwardArgs;pub use segment::SegmentMeanBackwardDescriptor;pub use segment::SegmentMeanBackwardPlan;pub use segment::SegmentMeanDescriptor;pub use segment::SegmentMeanPlan;pub use segment::SegmentMinArgs;pub use segment::SegmentMinBackwardArgs;pub use segment::SegmentMinBackwardDescriptor;pub use segment::SegmentMinBackwardPlan;pub use segment::SegmentMinDescriptor;pub use segment::SegmentMinPlan;pub use segment::SegmentProdArgs;pub use segment::SegmentProdBackwardArgs;pub use segment::SegmentProdBackwardDescriptor;pub use segment::SegmentProdBackwardPlan;pub use segment::SegmentProdDescriptor;pub use segment::SegmentProdPlan;pub use segment::SegmentSumArgs;pub use segment::SegmentSumBackwardArgs;pub use segment::SegmentSumBackwardDescriptor;pub use segment::SegmentSumBackwardPlan;pub use segment::SegmentSumDescriptor;pub use segment::SegmentSumPlan;pub use segment::UnsortedSegmentMaxArgs;pub use segment::UnsortedSegmentMaxBackwardArgs;pub use segment::UnsortedSegmentMaxBackwardDescriptor;pub use segment::UnsortedSegmentMaxBackwardPlan;pub use segment::UnsortedSegmentMaxDescriptor;pub use segment::UnsortedSegmentMaxPlan;pub use segment::UnsortedSegmentMeanArgs;pub use segment::UnsortedSegmentMeanBackwardArgs;pub use segment::UnsortedSegmentMeanBackwardDescriptor;pub use segment::UnsortedSegmentMeanBackwardPlan;pub use segment::UnsortedSegmentMeanDescriptor;pub use segment::UnsortedSegmentMeanPlan;pub use segment::UnsortedSegmentMinArgs;pub use segment::UnsortedSegmentMinBackwardArgs;pub use segment::UnsortedSegmentMinBackwardDescriptor;pub use segment::UnsortedSegmentMinBackwardPlan;pub use segment::UnsortedSegmentMinDescriptor;pub use segment::UnsortedSegmentMinPlan;pub use segment::UnsortedSegmentProdArgs;pub use segment::UnsortedSegmentProdBackwardArgs;pub use segment::UnsortedSegmentProdBackwardDescriptor;pub use segment::UnsortedSegmentProdBackwardPlan;pub use segment::UnsortedSegmentProdDescriptor;pub use segment::UnsortedSegmentProdPlan;pub use segment::UnsortedSegmentSumArgs;pub use segment::UnsortedSegmentSumBackwardArgs;pub use segment::UnsortedSegmentSumBackwardDescriptor;pub use segment::UnsortedSegmentSumBackwardPlan;pub use segment::UnsortedSegmentSumDescriptor;pub use segment::UnsortedSegmentSumPlan;pub use quantize::DequantizePerGroupArgs;pub use quantize::DequantizePerGroupBackwardArgs;pub use quantize::DequantizePerGroupBackwardDescriptor;pub use quantize::DequantizePerGroupBackwardPlan;pub use quantize::DequantizePerGroupDescriptor;pub use quantize::DequantizePerGroupPlan;pub use quantize::DequantizePerTokenArgs;pub use quantize::DequantizePerTokenBackwardArgs;pub use quantize::DequantizePerTokenBackwardDescriptor;pub use quantize::DequantizePerTokenBackwardPlan;pub use quantize::DequantizePerTokenDescriptor;pub use quantize::DequantizePerTokenPlan;pub use quantize::QuantizePerGroupArgs;pub use quantize::QuantizePerGroupBackwardArgs;pub use quantize::QuantizePerGroupBackwardDescriptor;pub use quantize::QuantizePerGroupBackwardPlan;pub use quantize::QuantizePerGroupDescriptor;pub use quantize::QuantizePerGroupPlan;pub use quantize::QuantizePerTokenArgs;pub use quantize::QuantizePerTokenBackwardArgs;pub use quantize::QuantizePerTokenBackwardDescriptor;pub use quantize::QuantizePerTokenBackwardPlan;pub use quantize::QuantizePerTokenDescriptor;pub use quantize::QuantizePerTokenPlan;pub use quantize::DequantizePerChannelArgs;pub use quantize::DequantizePerChannelBackwardArgs;pub use quantize::DequantizePerChannelBackwardDescriptor;pub use quantize::DequantizePerChannelBackwardPlan;pub use quantize::DequantizePerChannelDescriptor;pub use quantize::DequantizePerChannelPlan;pub use quantize::DequantizePerTensorArgs;pub use quantize::DequantizePerTensorBackwardArgs;pub use quantize::DequantizePerTensorBackwardDescriptor;pub use quantize::DequantizePerTensorBackwardPlan;pub use quantize::DequantizePerTensorDescriptor;pub use quantize::DequantizePerTensorPlan;pub use quantize::FakeQuantizeArgs;pub use quantize::FakeQuantizeBackwardArgs;pub use quantize::FakeQuantizeBackwardDescriptor;pub use quantize::FakeQuantizeBackwardPlan;pub use quantize::FakeQuantizeDescriptor;pub use quantize::FakeQuantizePlan;pub use quantize::QuantizePerChannelArgs;pub use quantize::QuantizePerChannelBackwardArgs;pub use quantize::QuantizePerChannelBackwardDescriptor;pub use quantize::QuantizePerChannelBackwardPlan;pub use quantize::QuantizePerChannelDescriptor;pub use quantize::QuantizePerChannelPlan;pub use quantize::QuantizePerTensorArgs;pub use quantize::QuantizePerTensorBackwardArgs;pub use quantize::QuantizePerTensorBackwardDescriptor;pub use quantize::QuantizePerTensorBackwardPlan;pub use quantize::QuantizePerTensorDescriptor;pub use quantize::QuantizePerTensorPlan;pub use quantize::DynamicRangeMode;pub use quantize::DynamicRangeQuantizeArgs;pub use quantize::DynamicRangeQuantizeDescriptor;pub use quantize::DynamicRangeQuantizePlan;pub use quantize::DynamicRangeScope;pub use quantize::QuantizedLinearArgs;pub use quantize::QuantizedLinearDescriptor;pub use quantize::QuantizedLinearPlan;pub use quantize::SmoothQuantLinearArgs;pub use quantize::SmoothQuantLinearDescriptor;pub use quantize::SmoothQuantLinearPlan;pub use quantize::BlockQ2K;pub use quantize::BlockQ3K;pub use quantize::BlockQ4_0;pub use quantize::BlockQ4_1;pub use quantize::BlockQ4K;pub use quantize::BlockQ5_0;pub use quantize::BlockQ5_1;pub use quantize::BlockQ5K;pub use quantize::BlockQ6K;pub use quantize::BlockQ8_0;pub use quantize::BlockQ8K;pub use quantize::GgufDequantizeArgs;pub use quantize::GgufDequantizeDescriptor;pub use quantize::GgufDequantizePlan;pub use quantize::GgufMmvqArgs;pub use quantize::GgufMmvqDescriptor;pub use quantize::GgufMmvqPlan;pub use quantize::GgufMmvqBatchedActivation;pub use quantize::GgufMmvqBatchedArgs;pub use quantize::GgufMmvqBatchedDescriptor;pub use quantize::GgufMmvqBatchedFormat;pub use quantize::GgufMmvqBatchedPlan;pub use quantize::GgufMmvqMultiMArgs;pub use quantize::GgufMmvqMultiMDescriptor;pub use quantize::GgufMmvqMultiMPlan;pub use quantize::Nf4Activation;pub use quantize::Nf4DequantizeArgs;pub use quantize::Nf4DequantizePlan;pub use quantize::Nf4Descriptor;pub use quantize::Nf4MmvqArgs;pub use quantize::Nf4MmvqMultiMArgs;pub use quantize::Nf4MmvqMultiMDescriptor;pub use quantize::Nf4MmvqMultiMPlan;pub use quantize::Nf4MmvqPlan;pub use quantize::NF4_CODEBOOK;pub use moe::MoeArgs;pub use moe::MoeDescriptor;pub use moe::MoePlan;pub use moe::MoeVariant;pub use image::AffineGridArgs;pub use image::AffineGridDescriptor;pub use image::AffineGridPlan;pub use image::GridSampleArgs;pub use image::GridSampleBackwardArgs;pub use image::GridSampleBackwardDescriptor;pub use image::GridSampleBackwardPlan;pub use image::GridSampleDescriptor;pub use image::GridSamplePlan;pub use image::InterpolateArgs;pub use image::InterpolateBackwardArgs;pub use image::InterpolateBackwardDescriptor;pub use image::InterpolateBackwardPlan;pub use image::InterpolateDescriptor;pub use image::InterpolateMode;pub use image::InterpolatePlan;pub use image::NmsArgs;pub use image::NmsDescriptor;pub use image::NmsPlan;pub use image::PixelShuffleArgs;pub use image::PixelShuffleDescriptor;pub use image::PixelShufflePlan;pub use image::PixelUnshuffleArgs;pub use image::PixelUnshuffleDescriptor;pub use image::PixelUnshufflePlan;pub use image::RoiAlignArgs;pub use image::RoiAlignBackwardArgs;pub use image::RoiAlignBackwardDescriptor;pub use image::RoiAlignBackwardPlan;pub use image::RoiAlignDescriptor;pub use image::RoiAlignPlan;pub use image::RoiPoolArgs;pub use image::RoiPoolBackwardArgs;pub use image::RoiPoolBackwardDescriptor;pub use image::RoiPoolBackwardPlan;pub use image::RoiPoolDescriptor;pub use image::RoiPoolPlan;pub use sort::ArgsortArgs;pub use sort::ArgsortDescriptor;pub use sort::ArgsortPlan;pub use sort::BincountArgs;pub use sort::BincountDescriptor;pub use sort::BincountPlan;pub use sort::HistogramArgs;pub use sort::HistogramDescriptor;pub use sort::HistogramPlan;pub use sort::HistogramddArgs;pub use sort::HistogramddDescriptor;pub use sort::HistogramddPlan;pub use sort::KthvalueArgs;pub use sort::KthvalueBackwardArgs;pub use sort::KthvalueBackwardDescriptor;pub use sort::KthvalueBackwardPlan;pub use sort::KthvalueDescriptor;pub use sort::KthvaluePlan;pub use sort::MsortArgs;pub use sort::MsortBackwardArgs;pub use sort::MsortBackwardDescriptor;pub use sort::MsortBackwardPlan;pub use sort::MsortDescriptor;pub use sort::MsortPlan;pub use sort::SearchsortedArgs;pub use sort::SearchsortedDescriptor;pub use sort::SearchsortedPlan;pub use sort::SortArgs;pub use sort::SortBackwardArgs;pub use sort::SortBackwardDescriptor;pub use sort::SortBackwardPlan;pub use sort::SortDescriptor;pub use sort::SortPlan;pub use sort::TopkArgs;pub use sort::TopkBackwardArgs;pub use sort::TopkBackwardDescriptor;pub use sort::TopkBackwardPlan;pub use sort::TopkDescriptor;pub use sort::TopkPlan;pub use sort::UniqueArgs;pub use sort::UniqueConsecutiveArgs;pub use sort::UniqueConsecutiveDescriptor;pub use sort::UniqueConsecutivePlan;pub use sort::UniqueDescriptor;pub use sort::UniquePlan;pub use sort::SORT_MAX_ROW;pub use sort::TOPK_MAX_K;pub use attention::RingAttentionArgs;pub use attention::RingAttentionDescriptor;pub use attention::RingAttentionPlan;pub use attention::RING_ATTENTION_HEAD_DIM;pub use attention::BatchPagedDecodeArgs;pub use attention::BatchPagedDecodeDescriptor;pub use attention::BatchPagedDecodePlan;pub use attention::BatchPagedDecodeFp8Args;pub use attention::BatchPagedDecodeFp8Descriptor;pub use attention::BatchPagedDecodeFp8Plan;pub use attention::BatchPagedPrefillArgs;pub use attention::BatchPagedPrefillDescriptor;pub use attention::BatchPagedPrefillPlan;pub use attention::BatchRaggedPrefillArgs;pub use attention::BatchRaggedPrefillDescriptor;pub use attention::BatchRaggedPrefillPlan;pub use attention::CascadeAttentionArgs;pub use attention::CascadeAttentionDescriptor;pub use attention::CascadeAttentionPlan;pub use attention::CascadeMergeStatesArgs;pub use attention::CascadeMergeStatesDescriptor;pub use attention::CascadeMergeStatesPlan;pub use attention::Fp8KvDtype;pub use attention::PagedKvAppendArgs;pub use attention::PagedKvAppendDescriptor;pub use attention::PagedKvAppendPlan;pub use attention::PagedKvCacheDescriptor;pub use random::PerRowSampler;pub use random::PerRowSamplingArgs;pub use random::PerRowSamplingDescriptor;pub use random::PerRowSamplingPlan;pub use random::SamplerKind;pub use random::SpeculativeSamplingArgs;pub use random::SpeculativeSamplingDescriptor;pub use random::SpeculativeSamplingPlan;pub use random::TokenPenaltyArgs;pub use random::TokenPenaltyDescriptor;pub use random::TokenPenaltyPlan;pub use random::TopKTopPSamplingArgs;pub use random::TopKTopPSamplingDescriptor;pub use random::TopKTopPSamplingPlan;
Modules§
- attention
- Attention op family — Phase 6 Category K.
- elementwise
- Elementwise op family — unified plan-based API.
- embedding
- Embedding op family — Category M.
- fft
- FFT op family — Milestone 6.4 (Category Fft).
- gemm
- GEMM family — unified plan-based API.
- image
- Image / spatial-transform op family — Category T.
- indexing
- Indexing / scatter / gather op family — Category L.
- linalg
- Dense linear algebra op family — Phase 6 (Category Linalg).
- loss
- Loss op family — Phase 5 Category R.
- moe
- Mixture-of-Experts (MoE) inference forward — Phase 8 Milestone 8.5 (Category V).
- norm
- Normalization op family — Phase 5 Category G.
- quantize
- Quantization op family — Category P.
- random
- Random / sampling op family — Phase 4.5 (Category Q).
- reduce
- Reduction op family — Phase 4 (Category E).
- scan
- Scan (associative prefix) op family — Phase 4 Category F.
- segment
- Segment / scatter-reduce op family — Category S.
- shape_
layout - Shape / layout op family — Category N from the comprehensive plan.
- softmax
- Softmax op family — Phase 5 Category H.
- sort
- Sorting / order-statistics op family — Category O.
Structs§
- Batched
Gemm Args - Per-launch arguments for a
BatchedGemmPlan::runcall. - Batched
Gemm Descriptor - Problem shape and configuration handed to
BatchedGemmPlan::select. - Batched
Gemm Plan - Plan for a uniform-shape batched GEMM launch.
- Bin
- 1-bit binary element marker — packed-byte storage.
- Bool
- Boolean element marker.
#[repr(transparent)]wrapper aroundu8(1-byte storage). - Complex32
- Single-precision complex element.
#[repr(C)]struct of twof32fields (real, imag) — ABI-compatible with cuFFT’scufftComplex(which is itself an alias for CUDA’sfloat2), with NumPy’scomplex64, and with PyTorch’storch.complex64. - Complex64
- Double-precision complex element.
#[repr(C)]struct of twof64fields — ABI-compatible with cuFFT’scufftDoubleComplex, NumPy’scomplex128, and PyTorch’storch.complex128. Sibling toComplex32. - F32Strict
- Strict-precision f32 element marker.
- Fp8E4M3
- 8-bit floating-point, E4M3 encoding (1 sign + 4 exponent + 3 mantissa, exponent bias 7).
- Fp8E5M2
- 8-bit floating-point, E5M2 encoding (1 sign + 5 exponent + 2 mantissa, exponent bias 15).
- Gemm
Args - Per-launch arguments for a
GemmPlan::runcall. - Gemm
Descriptor - Problem shape and configuration handed to
GemmPlan::select. - Gemm
Plan - Selected GEMM kernel and the host-side metadata needed to launch it.
- GemmSku
- Identity of the kernel a plan picked.
- Grouped
Gemm Plan - Plan for a grouped (per-problem variable shape) GEMM launch.
- Grouped
Plan Preference - Hints for
GroupedGemmPlan::select. - Grouped
Problem - One per-group entry for a grouped GEMM launch.
- IntGemm
Args - Per-launch arguments for an
IntGemmPlan::runcall. - IntGemm
Descriptor - Problem shape and configuration handed to
IntGemmPlan::select. - Kernel
Sku - Generalized kernel SKU — covers every op category.
- Matrix
Mut - Mutable view of a device-resident matrix (used for the output
D). - Matrix
Ref - Read-only view of a device-resident matrix.
- Plan
Preference - Hints that influence kernel selection inside a plan’s
selectmethod. - Precision
Guarantee - Numerical guarantees a kernel provides.
- Prepared
Grouped Gemm - A
GroupedGemmPlanbound to a concrete set of per-group problems. - S4
- Signed 4-bit integer element marker — packed-pair storage.
- S8
- Signed 8-bit integer element marker.
#[repr(transparent)]aroundi8. - Tensor
Mut - Mutable view of a device-resident rank-
Ntensor. - Tensor
Ref - Read-only view of a device-resident rank-
Ntensor. - U4
- Unsigned 4-bit integer element marker — packed-pair storage.
- U8
- Unsigned 8-bit integer element marker.
#[repr(transparent)]aroundu8. - Vector
Ref - Read-only view of a device-resident vector.
Enums§
- Activation
Kind - Activation functions implemented by the
Bias*ActivationEpilogueKindvariants. Surfaced for telemetry and selector logic; the kernel selection itself is driven by the enum variant. - ArchSku
- Compute capability bucket the selected kernel was compiled for.
- ArgReduce
Kind - Index-returning reduction discriminant — Phase 4 (
ArgReducePlan). - Attention
Kind - Attention-family op discriminant — Category K from the comprehensive plan.
- Backend
Kind - Which underlying compute backend served a kernel SKU.
- Bias
Element Kind - Runtime tag for a
BiasElement. - Binary
CmpKind - Binary comparison op discriminant.
- Binary
Kind - Binary elementwise op discriminant.
- Cross
Entropy Target Kind - CrossEntropy target-tensor kind. Selects between PyTorch’s two
target formats: class indices (
i64[N]) and soft probabilities (T[N, C]— used for label smoothing / distillation). - Element
Kind - Runtime tag for an
ElementorIntElement. - Embedding
Kind - Embedding-family op discriminant — Category M from the comprehensive plan.
- Epilogue
Kind - Epilogue applied after the matrix-multiply accumulation.
- Error
- Errors raised by the safe CUTLASS wrapper.
- FftKind
- FFT-family op discriminant — Category U from the comprehensive plan.
- Fill
Mode - Fill-mode tag for triangular linalg ops (Cholesky / triangular solve).
- Gated
Activation Kind - Gated-activation op discriminant (category C’).
- Gguf
Block Format - GGUF block-format selector for
QuantizeKind::GgufDequantize/QuantizeKind::GgufMmvqplans. Mirrors the discriminants used by llama.cpp /ggmlso a descriptor can be round-tripped to a GGUF file header without translation. - Grouped
Schedule Mode - How CUTLASS schedules tiles across the grouped problem set.
- Image
Kind - Image / spatial-transform op discriminant — Category T from the comprehensive plan.
- Index
Element Kind - Runtime tag for an
IndexElement.i32is the legacy default;i64was added in Phase 11.5 to match PyTorch’s int64 index convention without an extra cast pass. - Index
Output Kind - Runtime tag for an
IndexOutputElement.i64is the default (PyTorch convention) and the only variant prior to Phase 12.2;u32andi32were added so downstream frameworks that prefer narrower index dtypes (Fuel usesu32) can avoid a post-pass cast. - Indexing
Kind - Indexing / scatter / gather op discriminant — Category L from the comprehensive plan.
- Layout
Sku - Layout SKU. Describes the row/column orientation of A, B, C, and D for matrix-multiply-shaped kernels.
- Linalg
Kind - Linear-algebra (dense) op discriminant — covers the cuSOLVER family shipped in Milestone 6.3.
- Loss
Kind - Loss op discriminant — category R from the comprehensive plan.
- Loss
Reduction - Loss reduction mode. Selects the output shape and the final scalar
scaling for a
LossKindplan. PyTorch’sreductionparameter. - Math
Precision - Math precision used by the FMA / tensor-core instruction.
- MoeKind
- Mixture-of-Experts (MoE) variant selector — used as the
opdiscriminant for kernel SKUs whosecrate::OpCategoryiscrate::OpCategory::Moe. Phase 8 Milestone 8.5 wires the three fused per-token-dispatch + expert-matmul + accumulate kernels. - Normalization
Kind - Normalization op discriminant — category G from the comprehensive plan.
- OpCategory
- Op category — the top-level taxonomy a kernel SKU belongs to.
- PadMode
- Padding mode for
crate::ops::ShapeLayoutKind::Pad. - Pool
Kind - Pooling-family op discriminant — Category J from the comprehensive plan.
- Quantize
Kind - Quantization op discriminant — Category P from the comprehensive plan.
- Random
Kind - Random / sampling op discriminant.
- Reduce
Kind - Reduction op discriminant — Phase 4 (Category E).
- Reduce
ToOp - Broadcast-reverse reduction op discriminant —
ReduceToPlan. - Scan
Kind - Scan (associative prefix) op discriminant — category F from the comprehensive plan.
- Segment
Kind - Segment / scatter-reduce op discriminant — Category S from the comprehensive plan.
- Shape
Layout Kind - Shape / layout op discriminant — Category N.
- Softmax
Kind - Softmax-family op discriminant — category H from the comprehensive plan.
- Sort
Kind - Sorting / order-statistics op discriminant — Category O from the comprehensive plan (Phase 9).
- Ternary
Kind - Ternary elementwise op discriminant.
- Unary
Kind - Unary elementwise op discriminant.
- Workspace
- Caller-supplied workspace for a launch.
Traits§
- Bias
Element - Bias element types accepted by the int-GEMM bias epilogue family.
- BinElement
- Binary (1-bit) element types supported by the kernel facade.
- Element
- Element types supported by the kernel facade.
- FpElement
- 8-bit floating-point element types supported by the kernel facade.
- Index
Element - Sealed marker trait for index-element types accepted by the indexing / embedding / segment kernel families.
- Index
Output Element - Sealed marker trait for the output index dtype produced by
arg-reduction kernels (
argmax/argminaxis ops). - IntElement
- Integer element types supported by the int-GEMM kernel set.
- Kernel
Dtype - Umbrella marker trait for every dtype usable as a kernel input or output.
- Scalar
Type - Sealed marker for the alpha/beta scalar type an
Elementuses.
Functions§
- contiguous_
stride - Compute the row-major contiguous stride for the given
shape.
Type Aliases§
- Result
- Crate-local result alias.