oxicuda
Pure Rust CUDA replacement for the COOLJAPAN ecosystem.
Part of the OxiCUDA project.
Overview
Version: 0.1.4 — 2026-04-18 — 496 tests passing
oxicuda is the umbrella crate that re-exports all OxiCUDA sub-crates behind
feature flags. It provides a single dependency entry point for applications that
need GPU compute capabilities without installing the CUDA Toolkit -- libcuda.so
(or nvcuda.dll) is loaded dynamically at runtime.
The core crates (driver, memory, launch) are enabled by default. Higher-level
libraries -- BLAS, DNN, FFT, sparse, solver, and random number generation -- are
opt-in via feature flags. Enable full to get everything.
A prelude module and init() function provide convenient imports and
one-call CUDA driver initialization. Additional built-in modules cover
profiling, multi-GPU device pools, collective communication (NCCL equivalent),
pipeline parallelism, and multi-node distributed training.
Architecture
oxicuda (umbrella)
+---------+---------+---------+---------+
| | | | |
driver memory launch ptx autotune
| | | | |
+----+----+---------+---------+---------+
|
+------+------+------+------+------+
| | | | | |
blas dnn fft sparse solver rand
Quick Start
use *;
Feature Flags
| Feature | Description | Default |
|---|---|---|
driver |
CUDA driver API wrapper | Yes |
memory |
GPU memory management | Yes |
launch |
Kernel launch infrastructure | Yes |
ptx |
PTX code generation DSL | No |
autotune |
Autotuner engine (implies ptx) |
No |
blas |
cuBLAS equivalent | No |
dnn |
cuDNN equivalent (implies blas) |
No |
fft |
cuFFT equivalent | No |
sparse |
cuSPARSE equivalent | No |
solver |
cuSOLVER equivalent | No |
rand |
cuRAND equivalent | No |
pool |
Stream-ordered memory pool | No |
backend |
Abstract compute backend trait | No |
primitives |
CUB-equivalent parallel GPU primitives | No |
vulkan |
Vulkan compute backend (cross-vendor) | No |
metal |
Apple Metal compute backend (macOS/iOS) | No |
webgpu |
WebGPU compute backend (via wgpu) | No |
rocm |
AMD ROCm/HIP backend (Linux + AMD GPU) | No |
level-zero |
Intel Level Zero backend | No |
onnx-backend |
ONNX operator runtime and graph executor | No |
tensor-backend |
ToRSh GPU tensor backend with autograd | No |
transformer-backend |
TrustformeRS transformer inference backend | No |
wasm-backend |
WASM + WebGPU backend for browser environments | No |
full |
Enable all optional features | No |
Sub-crates
| Crate | Volume | Description |
|---|---|---|
oxicuda-driver |
Vol.1 | CUDA driver API bindings |
oxicuda-memory |
Vol.1 | Device, pinned, unified memory |
oxicuda-launch |
Vol.1 | Kernel launch and grid configuration |
oxicuda-ptx |
Vol.2 | PTX code generation DSL |
oxicuda-autotune |
Vol.2 | Autotuner for kernel parameters |
oxicuda-blas |
Vol.3 | Dense linear algebra (GEMM, etc.) |
oxicuda-dnn |
Vol.4 | Deep learning primitives |
oxicuda-fft |
Vol.5 | Fast Fourier Transform |
oxicuda-sparse |
Vol.5 | Sparse matrix operations |
oxicuda-solver |
Vol.5 | Matrix decompositions and solvers |
oxicuda-rand |
Vol.5 | Random number generation |
oxicuda-primitives |
Vol.5 | CUB-equivalent warp/block/device primitives |
oxicuda-backend |
— | Abstract compute backend trait |
oxicuda-vulkan |
— | Vulkan compute backend |
oxicuda-metal |
— | Apple Metal compute backend |
oxicuda-webgpu |
— | WebGPU compute backend |
oxicuda-rocm |
— | AMD ROCm/HIP backend |
oxicuda-levelzero |
— | Intel Level Zero backend |
License
Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)