Expand description
§OxiCUDA — Pure Rust CUDA Replacement
OxiCUDA provides a complete, pure Rust replacement for NVIDIA’s CUDA
software stack. It dynamically loads libcuda.so at runtime, requiring
no CUDA Toolkit at build time.
§Architecture
┌──────────────────────────────────────────────┐
│ COOLJAPAN Ecosystem │
│ SciRS2 │ oxionnx │ TrustformeRS │ ToRSh │
│ └────┬────┘ │ │
│ └───────────────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ OxiCUDA │ │
│ ├────────────────┤ │
│ │ Driver (Vol.1) │ │
│ │ Memory (Vol.1) │ │
│ │ Launch (Vol.1) │ │
│ │ PTX (Vol.2) │ │
│ │ Autotune(Vol.2)│ │
│ │ BLAS (Vol.3) │ │
│ │ DNN (Vol.4) │ │
│ │ FFT (Vol.5) │ │
│ │ Sparse (Vol.5) │ │
│ │ Solver (Vol.5) │ │
│ │ Rand (Vol.5) │ │
│ └───────┬────────┘ │
│ ┌───────▼────────┐ │
│ │ libcuda.so │ │
│ │ (NVIDIA Driver)│ │
│ └────────────────┘ │
└──────────────────────────────────────────────┘§Quick Start
use oxicuda::prelude::*;
fn main() -> CudaResult<()> {
// Initialize the CUDA driver
oxicuda::init()?;
// Enumerate devices
let device = Device::get(0)?;
println!("GPU: {}", device.name()?);
// Create context and stream
let ctx = Context::new(&device)?;
let ctx = std::sync::Arc::new(ctx);
let stream = Stream::new(&ctx)?;
// Allocate device memory
let mut buf = DeviceBuffer::<f32>::alloc(1024)?;
let host_data = vec![1.0f32; 1024];
buf.copy_from_host(&host_data)?;
Ok(())
}§Feature Flags
| Feature | Description | Default |
|---|---|---|
driver | CUDA driver API wrapper | Yes |
memory | GPU memory management | Yes |
launch | Kernel launch infrastructure | Yes |
ptx | PTX code generation DSL | No |
autotune | Autotuner engine | No |
blas | cuBLAS equivalent | No |
dnn | cuDNN equivalent | No |
fft | cuFFT equivalent | No |
sparse | cuSPARSE equivalent | No |
solver | cuSOLVER equivalent | No |
rand | cuRAND equivalent | No |
pool | Stream-ordered memory pool | No |
backend | Abstract compute backend trait | No |
full | Enable all features | No |
(C) 2026 COOLJAPAN OU (Team KitaSan)
Re-exports§
pub use global_init::DeviceSelection;pub use global_init::OxiCudaRuntime;pub use global_init::OxiCudaRuntimeBuilder;pub use oxicuda_driver as driver;pub use oxicuda_memory as memory;pub use oxicuda_launch as launch;
Modules§
- collective
- NCCL-equivalent collective communication primitives for multi-GPU training.
- copy
- Explicit memory copy operations between host and device.
- device_
pool - Thread-safe multi-GPU device pool with workload-aware scheduling.
- distributed
- Multi-node distributed training support (TCP/IP based).
- features
- Compile-time feature availability.
- global_
init - Global initialization with device auto-selection.
- pipeline_
parallel - Pipeline parallelism primitives for multi-GPU model parallelism.
- prelude
- Convenience re-exports for common usage patterns.
- profiling
- Profiling and tracing hooks for kernel-level performance analysis.
Macros§
- launch
- Launch a GPU kernel with a concise syntax.
Structs§
- Context
- RAII wrapper for a CUDA context.
- Device
- Represents a CUDA-capable GPU device.
- Device
Buffer - A contiguous buffer of
Telements allocated in GPU device memory. - Device
Slice - A borrowed, non-owning view into a sub-range of a
DeviceBuffer. - Dim3
- 3-dimensional size specification for grids and blocks.
- Event
- A CUDA event for timing and synchronisation.
- Function
- A kernel function handle within a loaded module.
- JitDiagnostic
- A single structured diagnostic emitted by the JIT compiler.
- JitLog
- Log output from JIT compilation.
- JitOptions
- Options for JIT compilation of PTX to GPU binary.
- Kernel
- A launchable GPU kernel with module lifetime management.
- Launch
Params - Parameters for a GPU kernel launch.
- Launch
Params Builder - Builder for
LaunchParams. - Module
- A loaded CUDA module containing one or more kernel functions.
- Pinned
Buffer - A contiguous buffer of
Telements in page-locked (pinned) host memory. - Stream
- A CUDA stream (GPU command queue).
- Unified
Buffer - A contiguous buffer of
Telements in CUDA unified (managed) memory.
Enums§
- Cuda
Error - Primary error type for CUDA driver API calls.
- Driver
Load Error - Errors that can occur while dynamically loading
libcuda.so/nvcuda.dll. - JitSeverity
- Severity of a JIT compiler diagnostic message.
Constants§
- AUTO_
SELECT_ THRESHOLD_ BYTES - Auto-selection threshold for the compute backend.
- SUPPORTED_
ONNX_ OPS - List of ONNX operators supported by the OxiCUDA ONNX backend.
Traits§
- Kernel
Args - Trait for types that can be passed as kernel arguments.
Functions§
- best_
device - Find the device with the most total memory.
- grid_
size_ for - Calculate the grid size needed to cover
nelements withblock_sizethreads. - init
- Initialize the CUDA driver API.
- list_
devices - List all available CUDA devices.
- try_
driver - Get a reference to the lazily-loaded CUDA driver API function table.
Type Aliases§
- Cuda
Result - Convenience result alias used throughout the crate.