Module cuda_backend

Expand description

CUDA compute backend for the OxiPhysics GPU acceleration layer.

This module provides CudaBackend which implements the compute-backend interface using NVIDIA CUDA via the cudarc crate for type-safe PTX / CUDA kernel management.

§Feature flag

This backend is gated behind the cuda-backend Cargo feature:

[dependencies]
oxiphysics-gpu = { features = ["cuda-backend"] }

When the feature is disabled the module compiles to a no-op stub returning CudaInitError::NotAvailable from CudaBackend::try_new.

When the feature is enabled, cudarc uses dynamic-loading (libloading) so the crate compiles on any platform; the CUDA driver is opened at runtime and an error is returned if it is absent (e.g. macOS, headless Linux without an NVIDIA driver).

§Architecture

 CudaBackend
  ├── cudarc::CudaContext                  ← CUDA device context (Arc)
  ├── cudarc::CudaStream                   ← Default stream for kernel dispatch
  ├── cudarc::CudaSlice<u8>                ← Device-resident buffer slices
  ├── Vec<CudaBufferEntry>                 ← Registered buffer metadata
  └── KernelRegistry                       ← Compiled PTX / NVRTC modules

 Compute pipeline:
   write_buffer [host→device memcpy via stream]
   → launch_kernel(grid, block, args)
   → read_buffer  [device→host memcpy via stream]

§Kernels shipped with this backend

Source constant	Description
`PTX_SPH_DENSITY`	SPH density summation (cubic-spline W3), 256 threads/block
`PTX_PARALLEL_SCAN`	Blelloch exclusive prefix scan, warp-shuffle optimised
`PTX_CONSTRAINT_PGS`	Block-PGS constraint solver, 64 threads/block
`CUDA_SPH_DENSITY_SRC`	CUDA C SPH density kernel (compiled at runtime via NVRTC)

§Example (when `cuda-backend` feature enabled)

use oxiphysics_gpu::compute::cuda_backend::CudaBackend;

let mut backend = CudaBackend::try_new(0)?;          // device 0
let buf = backend.create_buffer(1024);               // 1024 f64 slots
backend.write_buffer(buf, &vec![1.0_f64; 1024]);
backend.launch("sph_density", &[buf], 16, 256);      // 16 blocks × 256 threads
let result = backend.read_buffer(buf);

Structs§

CudaBackend: CUDA compute backend.
CudaBufferHandle: Opaque handle to a CUDA device buffer allocated by CudaBackend.
CudaDeviceInfo: Information about the selected CUDA device.

Enums§

CudaInitError: Errors returned by CudaBackend::try_new.

Constants§

CUDA_SPH_DENSITY_SRC: CUDA C source for SPH density summation kernel, compiled at runtime via NVRTC.
PTX_CONSTRAINT_PGS: PTX source for block-PGS constraint solving.
PTX_PARALLEL_SCAN: PTX source for Blelloch exclusive parallel prefix scan.
PTX_SPH_DENSITY: PTX source for SPH density summation with cubic-spline W3 kernel.