oxicuda-driver
Dynamic, safe Rust bindings for the NVIDIA CUDA Driver API.
Part of the OxiCUDA project.
Overview
oxicuda-driver is a pure Rust wrapper around the CUDA Driver API (cuda.h).
Unlike traditional approaches that require the CUDA Toolkit at build time,
this crate loads the driver shared library entirely at runtime via
libloading. No cuda.h, no
libcuda.so symlink, no nvcc -- the crate compiles on any standard Rust
toolchain.
The actual GPU driver is discovered the first time you call init() or
try_driver(). A global OnceLock singleton caches the loaded function
pointers for the lifetime of the process, so subsequent calls are
essentially free.
All public APIs return CudaResult<T> rather than panicking. The
CudaError enum covers roughly 100 CUDA driver error codes as
strongly-typed variants, making match-based error handling straightforward.
RAII wrappers (Context, Stream, Event, Module) automatically release
GPU resources on Drop.
Modules
| Module | Description |
|---|---|
ffi |
Raw C-compatible types (CUdevice, CUcontext, etc.) |
ffi_constants |
CUDA constant definitions and magic values |
ffi_launch |
Launch-related FFI structures (CUlaunchConfig, etc.) |
ffi_descriptors |
Descriptor types (TMA, texture, surface) |
error |
CudaError (~100 variants), CudaResult, check() helper |
loader |
Runtime library loading with OnceLock singleton |
device |
Device enumeration, attribute queries, best_device() |
context |
RAII CUDA context bound to a device |
context_config |
Context flags and device limit configuration |
stream |
Asynchronous command queue within a context |
event |
Timing and synchronisation markers on streams |
module |
PTX/cubin loading, JIT compilation, function lookup |
occupancy |
Occupancy-based launch configuration queries |
occupancy_ext |
Extended occupancy helpers (dynamic shared memory, cluster) |
primary_context |
Primary context management (cuDevicePrimaryCtxRetain) |
cooperative_launch |
Cooperative kernel launch (cuLaunchCooperativeKernel) |
graph |
CUDA Graph API (Graph, GraphNode, GraphExec, StreamCapture) |
link |
Link-time optimization (cuLinkCreate, cuLinkAddData) |
multi_gpu |
Multi-GPU device pool, round-robin scheduling |
nvlink_topology |
NVLink/NVSwitch topology detection and bandwidth queries |
memory_info |
Device memory info queries (cuMemGetInfo) |
stream_ordered_alloc |
Stream-ordered memory allocation (CUDA 11.2+) |
profiler |
Profiler control (cuProfilerStart/cuProfilerStop) |
debug |
GPU debugging, memory leak detection, kernel launch tracing |
function_attr |
CUDA function attribute queries |
tma |
Tensor Memory Access descriptor helpers (Hopper+) |
Quick Start
use *;
// Initialise the CUDA driver (loads libcuda at runtime).
init?;
// Pick the first available GPU and create a context.
let dev = get?;
let _ctx = new?;
// Load a PTX module and look up a kernel.
let module = from_ptx?;
let kernel = module.get_function?;
# Ok::
Features
| Feature | Description |
|---|---|
gpu-tests |
Enable tests that require a physical GPU |
Runtime Library Resolution
| Platform | Library searched |
|---|---|
| Linux | libcuda.so, libcuda.so.1 |
| Windows | nvcuda.dll |
| macOS | (returns UnsupportedPlatform at runtime -- NVIDIA dropped macOS support) |
Platform Support
| Platform | Status |
|---|---|
| Linux | Full support (NVIDIA driver 525+) |
| Windows | Full support (NVIDIA driver 525+) |
| macOS | Compile only (UnsupportedPlatform at runtime) |
Status
| Item | Value |
|---|---|
| Version | 0.1.3 (2026-04-17) |
| Tests | 333 passing |
| Warnings | 0 |
unwrap() |
0 |
License
Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)