ringkernel-cuda
NVIDIA CUDA backend for RingKernel.
Overview
This crate provides GPU compute support for RingKernel using NVIDIA CUDA via the cudarc library (v0.18.2). It implements the RingKernelRuntime trait for launching and managing persistent GPU kernels.
Requirements
- NVIDIA GPU with Compute Capability 7.0 or higher (Volta, Turing, Ampere, Ada, Hopper)
- CUDA Toolkit 12.x or later
- cudarc 0.18.2 (managed via workspace)
- Linux (native) or Windows (WSL2 with limitations)
Features
- Persistent kernel execution using cooperative groups
- Lock-free message queues in GPU global memory
- PTX compilation at runtime via NVRTC
- Multi-GPU device enumeration
- Stencil kernel loading and execution
Usage
use CudaRuntime;
async
Stencil Kernel Loading
For pre-transpiled CUDA kernels:
use ;
let loader = new;
let kernel = loader.load_from_source?;
let config = LaunchConfig ;
kernel.launch?;
cudarc 0.18.2 API
This crate uses cudarc 0.18.2 with the builder pattern for kernel launches:
use ;
// Load module and function
let module = device.inner.load_module?;
let func = module.load_function?;
// Launch with builder pattern
unsafe
For cooperative kernel launches (grid-wide synchronization):
use result as cuda_result;
unsafe
Exports
| Type | Description |
|---|---|
CudaRuntime |
Main runtime implementing RingKernelRuntime |
CudaDevice |
GPU device handle |
CudaKernel |
Compiled kernel handle |
CudaBuffer |
GPU memory buffer |
CudaControlBlock |
GPU-resident kernel state |
CudaMessageQueue |
Lock-free queue in GPU memory |
StencilKernelLoader |
Loads CUDA stencil kernels |
Platform Notes
Native Linux: Full support for persistent kernels using CUDA cooperative groups.
WSL2: Persistent kernels may not work due to cooperative group limitations. Falls back to event-driven execution.
Windows Native: Not currently supported. Use WSL2.
Testing
# Requires NVIDIA GPU
License
Apache-2.0