Expand description
Tensor Core / Matrix Multiply-Accumulate (MMA) operations
Emulates NVIDIA Tensor Core operations for mixed-precision matrix multiplication. On real hardware (SM 7.0+), these map to WMMA/MMA PTX instructions. In CPU fallback, we provide functionally-correct tiled matrix multiply with the same API semantics.
Supports: fp16×fp16→fp32, bf16×bf16→fp32, fp32→fp32, int8×int8→int32.
Structs§
- Fragment
- Matrix fragment — a tile of a matrix stored in registers. On GPU, these map to warp-distributed register fragments.
- Fragment
Shape - Fragment shape for WMMA operations. Maps to hardware-supported shapes like 16×16×16, 8×32×16, etc.
- Gemm
Stats - Statistics from a GEMM operation.
- Tensor
Core Engine - Tensor Core MMA engine.
Enums§
- MmaPrecision
- Precision mode for tensor core operations.