oxicuda 0.1.0

OxiCUDA - Pure Rust CUDA replacement for the COOLJAPAN ecosystem (95% performance target)
Documentation
# oxicuda

Pure Rust CUDA replacement for the COOLJAPAN ecosystem.

Part of the [OxiCUDA](https://github.com/cool-japan/oxicuda) project.

## Overview

`oxicuda` is the umbrella crate that re-exports all OxiCUDA sub-crates behind
feature flags. It provides a single dependency entry point for applications that
need GPU compute capabilities without installing the CUDA Toolkit -- `libcuda.so`
(or `nvcuda.dll`) is loaded dynamically at runtime.

The core crates (driver, memory, launch) are enabled by default. Higher-level
libraries -- BLAS, DNN, FFT, sparse, solver, and random number generation -- are
opt-in via feature flags. Enable `full` to get everything.

A `prelude` module and `init()` function provide convenient imports and
one-call CUDA driver initialization.

## Architecture

```text
                    oxicuda (umbrella)
     +---------+---------+---------+---------+
     |         |         |         |         |
  driver   memory    launch      ptx    autotune
     |         |         |         |         |
     +----+----+---------+---------+---------+
          |
   +------+------+------+------+------+
   |      |      |      |      |      |
  blas   dnn    fft   sparse solver  rand
```

## Quick Start

```rust,no_run
use oxicuda::prelude::*;

fn main() -> CudaResult<()> {
    oxicuda::init()?;

    let device = Device::get(0)?;
    let ctx = std::sync::Arc::new(Context::new(&device)?);
    let stream = Stream::new(&ctx)?;

    let mut buf = DeviceBuffer::<f32>::alloc(1024)?;
    let host = vec![1.0f32; 1024];
    buf.copy_from_host(&host)?;

    Ok(())
}
```

## Feature Flags

| Feature    | Description                          | Default |
|------------|--------------------------------------|---------|
| `driver`   | CUDA driver API wrapper              | Yes     |
| `memory`   | GPU memory management                | Yes     |
| `launch`   | Kernel launch infrastructure         | Yes     |
| `ptx`      | PTX code generation DSL              | No      |
| `autotune` | Autotuner engine (implies `ptx`)     | No      |
| `blas`     | cuBLAS equivalent                    | No      |
| `dnn`      | cuDNN equivalent (implies `blas`)    | No      |
| `fft`      | cuFFT equivalent                     | No      |
| `sparse`   | cuSPARSE equivalent                  | No      |
| `solver`   | cuSOLVER equivalent                  | No      |
| `rand`     | cuRAND equivalent                    | No      |
| `pool`     | Stream-ordered memory pool           | No      |
| `full`     | Enable all optional features         | No      |

## Sub-crates

| Crate             | Volume | Description                          |
|--------------------|--------|--------------------------------------|
| `oxicuda-driver`   | Vol.1  | CUDA driver API bindings             |
| `oxicuda-memory`   | Vol.1  | Device, pinned, unified memory       |
| `oxicuda-launch`   | Vol.1  | Kernel launch and grid configuration |
| `oxicuda-ptx`      | Vol.2  | PTX code generation DSL             |
| `oxicuda-autotune` | Vol.2  | Autotuner for kernel parameters      |
| `oxicuda-blas`     | Vol.3  | Dense linear algebra (GEMM, etc.)    |
| `oxicuda-dnn`      | Vol.4  | Deep learning primitives             |
| `oxicuda-fft`      | Vol.5  | Fast Fourier Transform               |
| `oxicuda-sparse`   | Vol.5  | Sparse matrix operations             |
| `oxicuda-solver`   | Vol.5  | Matrix decompositions and solvers    |
| `oxicuda-rand`     | Vol.5  | Random number generation             |

## License

Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)