Expand description
§OxiCUDA Runtime
Pure-Rust implementation of the CUDA Runtime API (libcudart) surface,
built on top of oxicuda-driver’s dynamic driver loader.
§Coverage
| Module | API functions |
|---|---|
device | cudaGetDeviceCount, cudaSetDevice, cudaGetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaDeviceReset |
memory | cudaMalloc, cudaFree, cudaMallocHost, cudaFreeHost, cudaMallocManaged, cudaMallocPitch, cudaMemcpy, cudaMemcpyAsync, cudaMemset, cudaMemGetInfo |
stream | cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamCreateWithPriority, cudaStreamDestroy, cudaStreamSynchronize, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamGetPriority, cudaStreamGetFlags |
event | cudaEventCreate, cudaEventCreateWithFlags, cudaEventDestroy, cudaEventRecord, cudaEventSynchronize, cudaEventQuery, cudaEventElapsedTime |
launch | cudaLaunchKernel (explicit function handle), cudaFuncGetAttributes, cudaFuncSetAttribute, module_load_ptx, module_get_function, module_unload |
peer | cudaDeviceCanAccessPeer, cudaDeviceEnablePeerAccess, cudaDeviceDisablePeerAccess, cudaMemcpyPeer, cudaMemcpyPeerAsync |
profiler | cudaProfilerStart, cudaProfilerStop, profiler::ProfilerGuard |
error | CudaRtError, CudaRtResult |
§Design goals
- Zero CUDA SDK build-time dependency: just like
oxicuda-driver, the runtime crate only needs the NVIDIA driver (libcuda.so/nvcuda.dll) at run time. - Ergonomic Rust API: strong types for streams, events, device pointers, and kernel dimensions instead of raw pointers.
- No unwrap: all fallible operations return
Result.
§Quick start
use oxicuda_runtime::{device, memory, stream, event};
use oxicuda_runtime::memory::MemcpyKind;
// Select device 0.
device::set_device(0)?;
// Allocate 1 MiB of device memory.
let d_buf = memory::malloc(1 << 20)?;
// Zero it.
memory::memset(d_buf, 0, 1 << 20)?;
// Create a stream, record an event.
let s = stream::stream_create()?;
let e = event::event_create()?;
event::event_record(e, s)?;
event::event_synchronize(e)?;
// Cleanup.
event::event_destroy(e)?;
stream::stream_destroy(s)?;
memory::free(d_buf)?;Re-exports§
pub use device::CudaDeviceProp;pub use error::CudaRtError;pub use error::CudaRtResult;pub use event::CudaEvent;pub use event::EventFlags;pub use launch::CudaFunction;pub use launch::CudaModule;pub use launch::Dim3;pub use launch::FuncAttribute;pub use launch::FuncAttributes;pub use memory::DevicePtr;pub use stream::CudaStream;pub use stream::StreamFlags;pub use texture::AddressMode;pub use texture::Array3DFlags;pub use texture::ArrayFormat;pub use texture::CudaArray;pub use texture::CudaArray3D;pub use texture::CudaSurfaceObject;pub use texture::CudaTextureObject;pub use texture::FilterMode;pub use texture::ResourceDesc;pub use texture::ResourceViewDesc;pub use texture::TextureDesc;
Modules§
- device
- Device management —
cudaGetDeviceCount,cudaSetDevice,cudaGetDevice,cudaGetDeviceProperties,cudaDeviceSynchronize,cudaDeviceReset. - error
- CUDA Runtime API error types.
- event
- CUDA event management.
- launch
- Kernel launch API.
- memory
- Device and host memory management.
- peer
- Peer-to-peer device access.
- profiler
- CUDA profiler control.
- stream
- CUDA stream management.
- texture
- Texture and surface memory — CUDA array allocation and bindless objects.
Functions§
- cuda_
free - Free device memory (mirrors
cudaFree). - cuda_
malloc - Allocate device memory (mirrors
cudaMalloc). - cuda_
memset - Zero device memory (mirrors
cudaMemset). - device_
synchronize - Block until all device operations complete (mirrors
cudaDeviceSynchronize). - get_
device - Get the current device for this thread (mirrors
cudaGetDevice). - get_
device_ count - Returns the number of CUDA-capable devices (mirrors
cudaGetDeviceCount). - memcpy_
d2d - Copy between device allocations.
- memcpy_
d2h - Copy device → host slice (typed helper, no raw pointers).
- memcpy_
h2d - Copy host slice → device (typed helper, no raw pointers).
- set_
device - Set the current device for this thread (mirrors
cudaSetDevice).