Crate oxicuda

Expand description

§OxiCUDA — Pure Rust CUDA Replacement

OxiCUDA provides a complete, pure Rust replacement for NVIDIA’s CUDA software stack. It dynamically loads libcuda.so at runtime, requiring no CUDA Toolkit at build time.

§Architecture

┌──────────────────────────────────────────────┐
│           COOLJAPAN Ecosystem                 │
│  SciRS2 │ oxionnx │ TrustformeRS │ ToRSh     │
│         └────┬────┘              │            │
│              └───────────────────┘            │
│                      │                        │
│              ┌───────▼────────┐               │
│              │    OxiCUDA     │               │
│              ├────────────────┤               │
│              │ Driver (Vol.1) │               │
│              │ Memory (Vol.1) │               │
│              │ Launch (Vol.1) │               │
│              │ PTX    (Vol.2) │               │
│              │ Autotune(Vol.2)│               │
│              │ BLAS   (Vol.3) │               │
│              │ DNN    (Vol.4) │               │
│              │ FFT    (Vol.5) │               │
│              │ Sparse (Vol.5) │               │
│              │ Solver (Vol.5) │               │
│              │ Rand   (Vol.5) │               │
│              └───────┬────────┘               │
│              ┌───────▼────────┐               │
│              │ libcuda.so     │               │
│              │ (NVIDIA Driver)│               │
│              └────────────────┘               │
└──────────────────────────────────────────────┘

§Quick Start

use oxicuda::prelude::*;

fn main() -> CudaResult<()> {
    // Initialize the CUDA driver
    oxicuda::init()?;

    // Enumerate devices
    let device = Device::get(0)?;
    println!("GPU: {}", device.name()?);

    // Create context and stream
    let ctx = Context::new(&device)?;
    let ctx = std::sync::Arc::new(ctx);
    let stream = Stream::new(&ctx)?;

    // Allocate device memory
    let mut buf = DeviceBuffer::<f32>::alloc(1024)?;
    let host_data = vec![1.0f32; 1024];
    buf.copy_from_host(&host_data)?;

    Ok(())
}

§Feature Flags

Feature	Description	Default
`driver`	CUDA driver API wrapper	Yes
`memory`	GPU memory management	Yes
`launch`	Kernel launch infrastructure	Yes
`ptx`	PTX code generation DSL	No
`autotune`	Autotuner engine	No
`blas`	cuBLAS equivalent	No
`dnn`	cuDNN equivalent	No
`fft`	cuFFT equivalent	No
`sparse`	cuSPARSE equivalent	No
`solver`	cuSOLVER equivalent	No
`rand`	cuRAND equivalent	No
`pool`	Stream-ordered memory pool	No
`backend`	Abstract compute backend trait	No
`full`	Enable all features	No

Re-exports§

pub use global_init::DeviceSelection;
pub use global_init::OxiCudaRuntime;
pub use global_init::OxiCudaRuntimeBuilder;
pub use wasm_backend::WasmComputeBackend;
pub use oxicuda_driver as driver;
pub use oxicuda_memory as memory;
pub use oxicuda_launch as launch;
pub use oxicuda_ptx as ptx;
pub use oxicuda_autotune as autotune;
pub use oxicuda_blas as blas;
pub use oxicuda_dnn as dnn;
pub use oxicuda_fft as fft;
pub use oxicuda_sparse as sparse;
pub use oxicuda_solver as solver;
pub use oxicuda_rand as rand;
pub use oxicuda_primitives as primitives;
pub use oxicuda_vulkan as vulkan;
pub use oxicuda_metal as metal_backend;
pub use oxicuda_webgpu as webgpu;
pub use oxicuda_rocm as rocm;
pub use oxicuda_levelzero as level_zero;

Modules§

backend: Abstract compute backend for GPU-accelerated operations.
collective: NCCL-equivalent collective communication primitives for multi-GPU training.
copy: Explicit memory copy operations between host and device.
device_pool: Thread-safe multi-GPU device pool with workload-aware scheduling.
distributed: Multi-node distributed training support (TCP/IP based).
features: Compile-time feature availability.
global_init: Global initialization with device auto-selection.
onnx_backend: ONNX GPU inference backend.
pipeline_parallel: Pipeline parallelism primitives for multi-GPU model parallelism.
prelude: Convenience re-exports for common usage patterns.
profiling: Profiling and tracing hooks for kernel-level performance analysis.
tensor_backend: ToRSh GPU tensor backend with autograd, optimizers, and mixed precision.
transformer_backend: TrustformeRS Transformer GPU Backend.
wasm_backend: WASM + WebGPU compute backend for browser environments.

Macros§

launch: Launch a GPU kernel with a concise syntax.

Structs§

Context: RAII wrapper for a CUDA context.
Device: Represents a CUDA-capable GPU device.
DeviceBuffer: A contiguous buffer of T elements allocated in GPU device memory.
DeviceSlice: A borrowed, non-owning view into a sub-range of a DeviceBuffer.
Dim3: 3-dimensional size specification for grids and blocks.
Event: A CUDA event for timing and synchronisation.
Function: A kernel function handle within a loaded module.
JitDiagnostic: A single structured diagnostic emitted by the JIT compiler.
JitLog: Log output from JIT compilation.
JitOptions: Options for JIT compilation of PTX to GPU binary.
Kernel: A launchable GPU kernel with module lifetime management.
LaunchParams: Parameters for a GPU kernel launch.
LaunchParamsBuilder: Builder for LaunchParams.
Module: A loaded CUDA module containing one or more kernel functions.
PinnedBuffer: A contiguous buffer of T elements in page-locked (pinned) host memory.
Stream: A CUDA stream (GPU command queue).
UnifiedBuffer: A contiguous buffer of T elements in CUDA unified (managed) memory.

Enums§

CudaError: Primary error type for CUDA driver API calls.
DriverLoadError: Errors that can occur while dynamically loading libcuda.so / nvcuda.dll.
JitSeverity: Severity of a JIT compiler diagnostic message.

Constants§

AUTO_SELECT_THRESHOLD_BYTES: Auto-selection threshold for the compute backend.
SUPPORTED_ONNX_OPS: List of ONNX operators supported by the OxiCUDA ONNX backend.

Traits§

KernelArgs: Trait for types that can be passed as kernel arguments.

Functions§

best_device: Find the device with the most total memory.
grid_size_for: Calculate the grid size needed to cover n elements with block_size threads.
init: Initialize the CUDA driver API.
list_devices: List all available CUDA devices.
try_driver: Get a reference to the lazily-loaded CUDA driver API function table.

Type Aliases§

CudaResult: Convenience result alias used throughout the crate.

Crate oxicuda

Crate oxicuda Copy item path

§OxiCUDA — Pure Rust CUDA Replacement

§Architecture

§Quick Start

§Feature Flags

Re-exports§

Modules§

Macros§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate oxicuda