Module tensor_ops

Expand description

Tensor Core / Matrix Multiply-Accumulate (MMA) operations

Emulates NVIDIA Tensor Core operations for mixed-precision matrix multiplication. On real hardware (SM 7.0+), these map to WMMA/MMA PTX instructions. In CPU fallback, we provide functionally-correct tiled matrix multiply with the same API semantics.

Supports: fp16×fp16→fp32, bf16×bf16→fp32, fp32→fp32, int8×int8→int32.

Structs§

Fragment: Matrix fragment — a tile of a matrix stored in registers. On GPU, these map to warp-distributed register fragments.
FragmentShape: Fragment shape for WMMA operations. Maps to hardware-supported shapes like 16×16×16, 8×32×16, etc.
GemmStats: Statistics from a GEMM operation.
TensorCoreEngine: Tensor Core MMA engine.

Enums§

MmaPrecision: Precision mode for tensor core operations.

Module tensor_ops

Module tensor_ops Copy item path

Structs§

Enums§

Module tensor_ops