cubecl-hip 0.5.0

# ROCm HIP runtime

Runtime that runs on ROCm HIP supported AMD GPUs.

Matrix multiplication acceleration is based on [rocwmma][] by default. Note that kernel compilation time
with [rocwmma][] might be slow.

For RDNA3 GPUs, a dedicated compiler using [WMMA intrinsics][] is available with the feature `wmma-intrinsics`.
It offers much faster kernel compilation time and better performances on some kernels. Feel free to benchmark
with your use cases.

[rocwmma]: https://github.com/ROCm/rocWMMA
[WMMA intrinsics]: https://gpuopen.com/learn/wmma_on_rdna3/