Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
hanzo-ml-rocm-kernels
ROCm/HIP kernel support for the Hanzo deep learning framework.
Overview
This crate provides ROCm (AMD GPU) kernel support for Hanzo. Unlike CUDA which can embed PTX directly, ROCm/HIP requires ahead-of-time (AOT) compilation for specific GPU architectures.
Architecture
AOT Cache System
We use an Ahead-of-Time (AOT) compilation cache approach:
- Source Code: HIP kernels are shipped as source code (
.hipfiles) - On-Demand Compilation: First time a kernel is needed, it's compiled using
hipcc - Caching: Compiled binaries are cached for reuse
- Future Runs: Load cached binaries directly (no recompilation)
Cache Location
Compiled binaries are stored at:
~/.cache/hanzo-ml-rocm/{arch}-{rocm_version}/
For example:
~/.cache/hanzo-ml-rocm/gfx908-6.1/binary_a1b2c3d4.cso~/.cache/hanzo-ml-rocm/gfx942-6.2/binary_a1b2c3d4.cso
Where:
{arch}= GPU architecture (gfx908, gfx90a, gfx942, etc.){rocm_version}= ROCm version (6.0, 6.1, 6.2, etc.){hash}= SHA256 hash of source code (first 16 chars)
Key Components
CacheManager (src/cache.rs)
Manages the AOT compilation cache:
- GPU Detection: Automatically detects GPU architecture using
rocminfoor falls back to environment variableCANDLE_ROCM_ARCH - Version Detection: Detects ROCm version using
hipcc --versionor environment variableCANDLE_ROCM_VERSION - Compilation: Invokes
hipccwith appropriate flags: - Caching: Stores compiled
.cso(code object) files with source hash versioning
Usage:
use CacheManager;
use Device;
let device = new?;
let cache = new?;
let binary = cache.get_or_compile?;
KernelManager (src/manager.rs)
Higher-level manager that:
- Wraps CacheManager
- Loads compiled binaries as
rocm_rs::hip::Module - Returns
Arc<Module>for thread-safe sharing - Maintains in-memory module cache
Usage:
use KernelManager;
use Source;
let device = new?;
let manager = new?;
let module = manager.get_or_compile_module?;
Environment Variables
CANDLE_ROCM_ARCH- Override GPU architecture detection (e.g., "gfx908")CANDLE_ROCM_VERSION- Override ROCm version detection (e.g., "6.1")
Requirements
- ROCm/HIP installed (provides
hipcc) - AMD GPU with supported architecture
Kernel Types
Currently supports:
- Binary operations: Add, Sub, Mul, Div, Minimum, Maximum
Building
Note: First build will compile dependencies. No GPU required for building, but hipcc must be in PATH if you want to compile kernels.
Testing
Implementation Notes
Why AOT instead of JIT?
The rocm-rs crate (v0.5) doesn't support runtime compilation. It only supports:
Module::load(path)- Load from fileModule::load_data(bytes)- Load from bytes
This makes JIT compilation (via hiprtc) unavailable, so we compile ahead-of-time on first run.
Supported GPU Architectures
Common AMD GPU architectures:
- CDNA2: gfx90a (MI200 series)
- CDNA3: gfx942 (MI300 series)
- RDNA3: gfx1100, gfx1101, gfx1102 (RX 7000 series)
The system will try to auto-detect, but you can override with CANDLE_ROCM_ARCH.
License
MIT OR Apache-2.0