Expand description
GPU-accelerated 2D matrix transpose.
Transposes a 2D matrix [rows, cols] to [cols, rows].
Supports F32 and F16 dtypes.
Functionsยง
- permute_
021_ bf16 - Encode a 3D permutation:
[A, B, C] -> [B, A, C](bf16). - transpose_
2d - Encode a 2D matrix transpose:
output[col, row] = input[row, col].