Expand description
Naive non-cooperative matmul without tiling that can be very fast on small matrices. Naive matmul kernel implementation
Each local unit will compute a single element of the output matrix.
Functionsยง
- launch
- launch_
ref - Matrix multiplication using memory coalescing algorithm with custom cube dimensions