Skip to main content

Module transpose

Module transpose 

Source
Expand description

GPU-accelerated 2D matrix transpose.

Transposes a 2D matrix [rows, cols] to [cols, rows]. Supports F32 and F16 dtypes.

Functionsยง

permute_021_bf16
Encode a 3D permutation: [A, B, C] -> [B, A, C] (bf16).
transpose_2d
Encode a 2D matrix transpose: output[col, row] = input[row, col].