Skip to main content

Crate oxicuda

Crate oxicuda 

Source
Expand description

§OxiCUDA — Pure Rust CUDA Replacement

OxiCUDA provides a complete, pure Rust replacement for NVIDIA’s CUDA software stack. It dynamically loads libcuda.so at runtime, requiring no CUDA Toolkit at build time.

§Architecture

┌──────────────────────────────────────────────┐
│           COOLJAPAN Ecosystem                 │
│  SciRS2 │ oxionnx │ TrustformeRS │ ToRSh     │
│         └────┬────┘              │            │
│              └───────────────────┘            │
│                      │                        │
│              ┌───────▼────────┐               │
│              │    OxiCUDA     │               │
│              ├────────────────┤               │
│              │ Driver (Vol.1) │               │
│              │ Memory (Vol.1) │               │
│              │ Launch (Vol.1) │               │
│              │ PTX    (Vol.2) │               │
│              │ Autotune(Vol.2)│               │
│              │ BLAS   (Vol.3) │               │
│              │ DNN    (Vol.4) │               │
│              │ FFT    (Vol.5) │               │
│              │ Sparse (Vol.5) │               │
│              │ Solver (Vol.5) │               │
│              │ Rand   (Vol.5) │               │
│              └───────┬────────┘               │
│              ┌───────▼────────┐               │
│              │ libcuda.so     │               │
│              │ (NVIDIA Driver)│               │
│              └────────────────┘               │
└──────────────────────────────────────────────┘

§Quick Start

use oxicuda::prelude::*;

fn main() -> CudaResult<()> {
    // Initialize the CUDA driver
    oxicuda::init()?;

    // Enumerate devices
    let device = Device::get(0)?;
    println!("GPU: {}", device.name()?);

    // Create context and stream
    let ctx = Context::new(&device)?;
    let ctx = std::sync::Arc::new(ctx);
    let stream = Stream::new(&ctx)?;

    // Allocate device memory
    let mut buf = DeviceBuffer::<f32>::alloc(1024)?;
    let host_data = vec![1.0f32; 1024];
    buf.copy_from_host(&host_data)?;

    Ok(())
}

§Feature Flags

FeatureDescriptionDefault
driverCUDA driver API wrapperYes
memoryGPU memory managementYes
launchKernel launch infrastructureYes
ptxPTX code generation DSLNo
autotuneAutotuner engineNo
blascuBLAS equivalentNo
dnncuDNN equivalentNo
fftcuFFT equivalentNo
sparsecuSPARSE equivalentNo
solvercuSOLVER equivalentNo
randcuRAND equivalentNo
poolStream-ordered memory poolNo
backendAbstract compute backend traitNo
fullEnable all featuresNo

(C) 2026 COOLJAPAN OU (Team KitaSan)

Re-exports§

pub use global_init::DeviceSelection;
pub use global_init::OxiCudaRuntime;
pub use global_init::OxiCudaRuntimeBuilder;
pub use oxicuda_driver as driver;
pub use oxicuda_memory as memory;
pub use oxicuda_launch as launch;

Modules§

collective
NCCL-equivalent collective communication primitives for multi-GPU training.
copy
Explicit memory copy operations between host and device.
device_pool
Thread-safe multi-GPU device pool with workload-aware scheduling.
distributed
Multi-node distributed training support (TCP/IP based).
features
Compile-time feature availability.
global_init
Global initialization with device auto-selection.
pipeline_parallel
Pipeline parallelism primitives for multi-GPU model parallelism.
prelude
Convenience re-exports for common usage patterns.
profiling
Profiling and tracing hooks for kernel-level performance analysis.

Macros§

launch
Launch a GPU kernel with a concise syntax.

Structs§

Context
RAII wrapper for a CUDA context.
Device
Represents a CUDA-capable GPU device.
DeviceBuffer
A contiguous buffer of T elements allocated in GPU device memory.
DeviceSlice
A borrowed, non-owning view into a sub-range of a DeviceBuffer.
Dim3
3-dimensional size specification for grids and blocks.
Event
A CUDA event for timing and synchronisation.
Function
A kernel function handle within a loaded module.
JitDiagnostic
A single structured diagnostic emitted by the JIT compiler.
JitLog
Log output from JIT compilation.
JitOptions
Options for JIT compilation of PTX to GPU binary.
Kernel
A launchable GPU kernel with module lifetime management.
LaunchParams
Parameters for a GPU kernel launch.
LaunchParamsBuilder
Builder for LaunchParams.
Module
A loaded CUDA module containing one or more kernel functions.
PinnedBuffer
A contiguous buffer of T elements in page-locked (pinned) host memory.
Stream
A CUDA stream (GPU command queue).
UnifiedBuffer
A contiguous buffer of T elements in CUDA unified (managed) memory.

Enums§

CudaError
Primary error type for CUDA driver API calls.
DriverLoadError
Errors that can occur while dynamically loading libcuda.so / nvcuda.dll.
JitSeverity
Severity of a JIT compiler diagnostic message.

Constants§

AUTO_SELECT_THRESHOLD_BYTES
Auto-selection threshold for the compute backend.
SUPPORTED_ONNX_OPS
List of ONNX operators supported by the OxiCUDA ONNX backend.

Traits§

KernelArgs
Trait for types that can be passed as kernel arguments.

Functions§

best_device
Find the device with the most total memory.
grid_size_for
Calculate the grid size needed to cover n elements with block_size threads.
init
Initialize the CUDA driver API.
list_devices
List all available CUDA devices.
try_driver
Get a reference to the lazily-loaded CUDA driver API function table.

Type Aliases§

CudaResult
Convenience result alias used throughout the crate.