Skip to main content

Crate oxicuda_runtime

Crate oxicuda_runtime 

Source
Expand description

§OxiCUDA Runtime

Pure-Rust implementation of the CUDA Runtime API (libcudart) surface, built on top of oxicuda-driver’s dynamic driver loader.

§Coverage

ModuleAPI functions
devicecudaGetDeviceCount, cudaSetDevice, cudaGetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaDeviceReset
memorycudaMalloc, cudaFree, cudaMallocHost, cudaFreeHost, cudaMallocManaged, cudaMallocPitch, cudaMemcpy, cudaMemcpyAsync, cudaMemset, cudaMemGetInfo
streamcudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamCreateWithPriority, cudaStreamDestroy, cudaStreamSynchronize, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamGetPriority, cudaStreamGetFlags
eventcudaEventCreate, cudaEventCreateWithFlags, cudaEventDestroy, cudaEventRecord, cudaEventSynchronize, cudaEventQuery, cudaEventElapsedTime
launchcudaLaunchKernel (explicit function handle), cudaFuncGetAttributes, cudaFuncSetAttribute, module_load_ptx, module_get_function, module_unload
peercudaDeviceCanAccessPeer, cudaDeviceEnablePeerAccess, cudaDeviceDisablePeerAccess, cudaMemcpyPeer, cudaMemcpyPeerAsync
profilercudaProfilerStart, cudaProfilerStop, profiler::ProfilerGuard
errorCudaRtError, CudaRtResult

§Design goals

  • Zero CUDA SDK build-time dependency: just like oxicuda-driver, the runtime crate only needs the NVIDIA driver (libcuda.so / nvcuda.dll) at run time.
  • Ergonomic Rust API: strong types for streams, events, device pointers, and kernel dimensions instead of raw pointers.
  • No unwrap: all fallible operations return Result.

§Quick start

use oxicuda_runtime::{device, memory, stream, event};
use oxicuda_runtime::memory::MemcpyKind;

// Select device 0.
device::set_device(0)?;

// Allocate 1 MiB of device memory.
let d_buf = memory::malloc(1 << 20)?;

// Zero it.
memory::memset(d_buf, 0, 1 << 20)?;

// Create a stream, record an event.
let s = stream::stream_create()?;
let e = event::event_create()?;
event::event_record(e, s)?;
event::event_synchronize(e)?;

// Cleanup.
event::event_destroy(e)?;
stream::stream_destroy(s)?;
memory::free(d_buf)?;

Re-exports§

pub use device::CudaDeviceProp;
pub use error::CudaRtError;
pub use error::CudaRtResult;
pub use event::CudaEvent;
pub use event::EventFlags;
pub use launch::CudaFunction;
pub use launch::CudaModule;
pub use launch::Dim3;
pub use launch::FuncAttribute;
pub use launch::FuncAttributes;
pub use memory::DevicePtr;
pub use stream::CudaStream;
pub use stream::StreamFlags;
pub use texture::AddressMode;
pub use texture::Array3DFlags;
pub use texture::ArrayFormat;
pub use texture::CudaArray;
pub use texture::CudaArray3D;
pub use texture::CudaSurfaceObject;
pub use texture::CudaTextureObject;
pub use texture::FilterMode;
pub use texture::ResourceDesc;
pub use texture::ResourceViewDesc;
pub use texture::TextureDesc;

Modules§

device
Device management — cudaGetDeviceCount, cudaSetDevice, cudaGetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaDeviceReset.
error
CUDA Runtime API error types.
event
CUDA event management.
launch
Kernel launch API.
memory
Device and host memory management.
peer
Peer-to-peer device access.
profiler
CUDA profiler control.
stream
CUDA stream management.
texture
Texture and surface memory — CUDA array allocation and bindless objects.

Functions§

cuda_free
Free device memory (mirrors cudaFree).
cuda_malloc
Allocate device memory (mirrors cudaMalloc).
cuda_memset
Zero device memory (mirrors cudaMemset).
device_synchronize
Block until all device operations complete (mirrors cudaDeviceSynchronize).
get_device
Get the current device for this thread (mirrors cudaGetDevice).
get_device_count
Returns the number of CUDA-capable devices (mirrors cudaGetDeviceCount).
memcpy_d2d
Copy between device allocations.
memcpy_d2h
Copy device → host slice (typed helper, no raw pointers).
memcpy_h2d
Copy host slice → device (typed helper, no raw pointers).
set_device
Set the current device for this thread (mirrors cudaSetDevice).