cubecl-hip 0.5.0

AMD ROCm HIP runtime for CubeCL
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
# ROCm HIP runtime

Runtime that runs on ROCm HIP supported AMD GPUs.

Matrix multiplication acceleration is based on [rocwmma][] by default. Note that kernel compilation time
with [rocwmma][] might be slow.

For RDNA3 GPUs, a dedicated compiler using [WMMA intrinsics][] is available with the feature `wmma-intrinsics`.
It offers much faster kernel compilation time and better performances on some kernels. Feel free to benchmark
with your use cases.

[rocwmma]: https://github.com/ROCm/rocWMMA
[WMMA intrinsics]: https://gpuopen.com/learn/wmma_on_rdna3/