#[gpu_kernel]Expand description
Attribute macro for defining multi-backend GPU kernels.
This macro generates code for multiple GPU backends with compile-time
capability validation. It integrates with the ringkernel-ir crate
to lower Rust DSL to backend-specific shader code.
§Attributes
backends = [cuda, metal, wgpu]- Target backends (default: all)fallback = [cuda, metal, wgpu, cpu]- Fallback order for runtime selectionrequires = [f64, atomic64]- Required capabilities (validated at compile time)id = "kernel_name"- Explicit kernel identifierblock_size = 256- Thread block size
§Example
ⓘ
use ringkernel_derive::gpu_kernel;
#[gpu_kernel(backends = [cuda, metal], requires = [subgroups])]
fn warp_reduce(data: &mut [f32], n: i32) {
let idx = global_thread_id_x();
if idx < n {
// Use warp shuffle for reduction
let val = data[idx as usize];
let reduced = warp_reduce_sum(val);
if lane_id() == 0 {
data[idx as usize] = reduced;
}
}
}§Capability Checking
The macro validates at compile time that all required capabilities are supported by at least one target backend:
| Capability | CUDA | Metal | WebGPU | CPU |
|---|---|---|---|---|
| f64 | Yes | No | No | Yes |
| i64 | Yes | Yes | No | Yes |
| atomic64 | Yes | Yes | No* | Yes |
| cooperative_groups | Yes | No | No | Yes |
| subgroups | Yes | Yes | Opt | Yes |
| shared_memory | Yes | Yes | Yes | Yes |
| f16 | Yes | Yes | Yes | Yes |
*WebGPU emulates 64-bit atomics with 32-bit pairs.
§Generated Code
For each compatible backend, the macro generates:
- Backend-specific source code constant (e.g.,
KERNEL_NAME_CUDA_SOURCE) - Registration entry for runtime discovery
- CPU fallback function (if
cpu_fallback = true)