Proc macro for marking GPU kernel functions.
#[warp_kernel] transforms a function into a proper PTX kernel entry point
when compiling for nvptx64, and generates a host-side launcher when compiling
for the host target.
Usage
In your kernel crate (compiled for nvptx64):
use warp_types::prelude::*;
use warp_types_kernel::warp_kernel;
#[warp_kernel]
pub fn butterfly_reduce(data: *mut i32) {
let warp: Warp<All> = Warp::kernel_entry();
let tid = warp_types::gpu::thread_id_x();
let mut val = unsafe { *data.add(tid as usize) };
val += warp.shuffle_xor(PerLane::new(val), 16).get();
val += warp.shuffle_xor(PerLane::new(val), 8).get();
val += warp.shuffle_xor(PerLane::new(val), 4).get();
val += warp.shuffle_xor(PerLane::new(val), 2).get();
val += warp.shuffle_xor(PerLane::new(val), 1).get();
unsafe { *data.add(tid as usize) = val; }
}
The macro emits:
- On nvptx64:
#[no_mangle] pub unsafe extern "ptx-kernel" fn butterfly_reduce(...)
- On host: nothing (kernel functions are only compiled for GPU)