Expand description
Hopper Tensor Memory Accelerator (TMA) descriptors.
CUDA 12.0+ introduced cuTensorMapEncodeTiled / cuTensorMapEncodeIm2col
to produce CUtensorMap descriptors that TMA instructions in kernels
consume to asynchronously move multi-dimensional tiles between global
and shared memory. This is a Hopper-only hardware feature (SM 9.0+),
but the descriptor encoding itself is pure host code and works on
any device.
See the TensorMap builder for a typed wrapper around
cuTensorMapEncodeTiled.
Modules§
- Data
Type CUtensorMapDataType— element type encoding for TMA descriptors.- Interleave
CUtensorMapInterleave.- L2Promotion
CUtensorMapL2promotion— L2 prefetch hint.- OOBFill
CUtensorMapFloatOOBfill— out-of-bounds fill behavior.- Swizzle
CUtensorMapSwizzle.
Structs§
- Tensor
Map - A 128-byte Hopper TMA descriptor. Pass to a kernel as a
__grid_constant__parameter of typeCUtensorMapfor use with TMA instructions.