Skip to main content

Module tensor_ops

Module tensor_ops 

Source
Expand description

Tensor Core / Matrix Multiply-Accumulate (MMA) operations

Emulates NVIDIA Tensor Core operations for mixed-precision matrix multiplication. On real hardware (SM 7.0+), these map to WMMA/MMA PTX instructions. In CPU fallback, we provide functionally-correct tiled matrix multiply with the same API semantics.

Supports: fp16×fp16→fp32, bf16×bf16→fp32, fp32→fp32, int8×int8→int32.

Structs§

Fragment
Matrix fragment — a tile of a matrix stored in registers. On GPU, these map to warp-distributed register fragments.
FragmentShape
Fragment shape for WMMA operations. Maps to hardware-supported shapes like 16×16×16, 8×32×16, etc.
GemmStats
Statistics from a GEMM operation.
TensorCoreEngine
Tensor Core MMA engine.

Enums§

MmaPrecision
Precision mode for tensor core operations.