oxicuda

Pure Rust CUDA replacement for the COOLJAPAN ecosystem.

Part of the OxiCUDA project.

Overview

Version: 0.1.3 — 2026-04-17 — 496 tests passing

oxicuda is the umbrella crate that re-exports all OxiCUDA sub-crates behind feature flags. It provides a single dependency entry point for applications that need GPU compute capabilities without installing the CUDA Toolkit -- libcuda.so (or nvcuda.dll) is loaded dynamically at runtime.

The core crates (driver, memory, launch) are enabled by default. Higher-level libraries -- BLAS, DNN, FFT, sparse, solver, and random number generation -- are opt-in via feature flags. Enable full to get everything.

A prelude module and init() function provide convenient imports and one-call CUDA driver initialization. Additional built-in modules cover profiling, multi-GPU device pools, collective communication (NCCL equivalent), pipeline parallelism, and multi-node distributed training.

Architecture

                    oxicuda (umbrella)
     +---------+---------+---------+---------+
     |         |         |         |         |
  driver   memory    launch      ptx    autotune
     |         |         |         |         |
     +----+----+---------+---------+---------+
          |
   +------+------+------+------+------+
   |      |      |      |      |      |
  blas   dnn    fft   sparse solver  rand

Quick Start

use oxicuda::prelude::*;

fn main() -> CudaResult<()> {
    oxicuda::init()?;

    let device = Device::get(0)?;
    let ctx = std::sync::Arc::new(Context::new(&device)?);
    let stream = Stream::new(&ctx)?;

    let mut buf = DeviceBuffer::<f32>::alloc(1024)?;
    let host = vec![1.0f32; 1024];
    buf.copy_from_host(&host)?;

    Ok(())
}

Feature Flags

Feature	Description	Default
`driver`	CUDA driver API wrapper	Yes
`memory`	GPU memory management	Yes
`launch`	Kernel launch infrastructure	Yes
`ptx`	PTX code generation DSL	No
`autotune`	Autotuner engine (implies `ptx`)	No
`blas`	cuBLAS equivalent	No
`dnn`	cuDNN equivalent (implies `blas`)	No
`fft`	cuFFT equivalent	No
`sparse`	cuSPARSE equivalent	No
`solver`	cuSOLVER equivalent	No
`rand`	cuRAND equivalent	No
`pool`	Stream-ordered memory pool	No
`backend`	Abstract compute backend trait	No
`primitives`	CUB-equivalent parallel GPU primitives	No
`vulkan`	Vulkan compute backend (cross-vendor)	No
`metal`	Apple Metal compute backend (macOS/iOS)	No
`webgpu`	WebGPU compute backend (via wgpu)	No
`rocm`	AMD ROCm/HIP backend (Linux + AMD GPU)	No
`level-zero`	Intel Level Zero backend	No
`onnx-backend`	ONNX operator runtime and graph executor	No
`tensor-backend`	ToRSh GPU tensor backend with autograd	No
`transformer-backend`	TrustformeRS transformer inference backend	No
`wasm-backend`	WASM + WebGPU backend for browser environments	No
`full`	Enable all optional features	No

Sub-crates

Crate	Volume	Description
`oxicuda-driver`	Vol.1	CUDA driver API bindings
`oxicuda-memory`	Vol.1	Device, pinned, unified memory
`oxicuda-launch`	Vol.1	Kernel launch and grid configuration
`oxicuda-ptx`	Vol.2	PTX code generation DSL
`oxicuda-autotune`	Vol.2	Autotuner for kernel parameters
`oxicuda-blas`	Vol.3	Dense linear algebra (GEMM, etc.)
`oxicuda-dnn`	Vol.4	Deep learning primitives
`oxicuda-fft`	Vol.5	Fast Fourier Transform
`oxicuda-sparse`	Vol.5	Sparse matrix operations
`oxicuda-solver`	Vol.5	Matrix decompositions and solvers
`oxicuda-rand`	Vol.5	Random number generation
`oxicuda-primitives`	Vol.5	CUB-equivalent warp/block/device primitives
`oxicuda-backend`	—	Abstract compute backend trait
`oxicuda-vulkan`	—	Vulkan compute backend
`oxicuda-metal`	—	Apple Metal compute backend
`oxicuda-webgpu`	—	WebGPU compute backend
`oxicuda-rocm`	—	AMD ROCm/HIP backend
`oxicuda-levelzero`	—	Intel Level Zero backend