ringkernel-cuda-codegen
Rust-to-CUDA transpiler for RingKernel GPU kernels.
Overview
This crate enables writing GPU kernels in a restricted Rust DSL and transpiling them to CUDA C code. It supports three kernel types:
- Global Kernels - Standard CUDA
__global__functions - Stencil Kernels - Tile-based kernels with
GridPosabstraction - Ring Kernels - Persistent actor kernels with message loops
Installation
[]
= "0.1"
= { = "2.0", = ["full"] }
Global Kernels
For general-purpose CUDA kernels:
use transpile_global_kernel;
use parse_quote;
let func: ItemFn = parse_quote! ;
let cuda_code = transpile_global_kernel?;
Stencil Kernels
For grid-based computations with neighbor access:
use ;
let func: ItemFn = parse_quote! ;
let config = new
.with_tile_size
.with_halo;
let cuda_code = transpile_stencil_kernel?;
Ring Kernels
For persistent actor-model kernels:
use ;
let handler: ItemFn = parse_quote! ;
let config = new
.with_block_size
.with_queue_capacity
.with_hlc // Hybrid Logical Clocks
.with_k2k; // Kernel-to-kernel messaging
let cuda_code = transpile_ring_kernel?;
DSL Reference
Thread/Block Indices
thread_idx_x(),thread_idx_y(),thread_idx_z()block_idx_x(),block_idx_y(),block_idx_z()block_dim_x(),block_dim_y(),block_dim_z()grid_dim_x(),grid_dim_y(),grid_dim_z()
Stencil Intrinsics
pos.idx()- Linear indexpos.north(buf),pos.south(buf),pos.east(buf),pos.west(buf)pos.at(buf, dx, dy)- Relative offset access
Synchronization
sync_threads()- Block-level barrierthread_fence()- Device memory fencethread_fence_block()- Block memory fence
Atomics
atomic_add(ptr, val),atomic_sub(ptr, val)atomic_min(ptr, val),atomic_max(ptr, val)atomic_exchange(ptr, val),atomic_cas(ptr, compare, val)
Math Functions
sqrt(),abs(),floor(),ceil(),round()sin(),cos(),tan(),exp(),log()powf(),min(),max(),mul_add()
Warp Operations
warp_shuffle(val, lane),warp_shuffle_up(val, delta)warp_shuffle_down(val, delta),warp_shuffle_xor(val, mask)warp_ballot(pred),warp_all(pred),warp_any(pred)
Type Mapping
| Rust Type | CUDA Type |
|---|---|
f32 |
float |
f64 |
double |
i32 |
int |
u32 |
unsigned int |
i64 |
long long |
u64 |
unsigned long long |
bool |
int |
&[T] |
const T* __restrict__ |
&mut [T] |
T* __restrict__ |
Testing
The crate includes 143 tests covering all kernel types and language features.
License
Apache-2.0