Skip to main content

Module cuda_backend

Module cuda_backend 

Source
Expand description

CUDA compute backend for the OxiPhysics GPU acceleration layer.

This module provides CudaBackend which implements the compute-backend interface using NVIDIA CUDA via the cudarc crate for type-safe PTX / CUDA kernel management.

§Feature flag

This backend is gated behind the cuda-backend Cargo feature:

[dependencies]
oxiphysics-gpu = { features = ["cuda-backend"] }

When the feature is disabled the module compiles to a no-op stub returning CudaInitError::NotAvailable from CudaBackend::try_new.

When the feature is enabled, cudarc uses dynamic-loading (libloading) so the crate compiles on any platform; the CUDA driver is opened at runtime and an error is returned if it is absent (e.g. macOS, headless Linux without an NVIDIA driver).

§Architecture

 CudaBackend
  ├── cudarc::CudaContext                  ← CUDA device context (Arc)
  ├── cudarc::CudaStream                   ← Default stream for kernel dispatch
  ├── cudarc::CudaSlice<u8>                ← Device-resident buffer slices
  ├── Vec<CudaBufferEntry>                 ← Registered buffer metadata
  └── KernelRegistry                       ← Compiled PTX / NVRTC modules

 Compute pipeline:
   write_buffer [host→device memcpy via stream]
   → launch_kernel(grid, block, args)
   → read_buffer  [device→host memcpy via stream]

§Kernels shipped with this backend

Source constantDescription
PTX_SPH_DENSITYSPH density summation (cubic-spline W3), 256 threads/block
PTX_PARALLEL_SCANBlelloch exclusive prefix scan, warp-shuffle optimised
PTX_CONSTRAINT_PGSBlock-PGS constraint solver, 64 threads/block
CUDA_SPH_DENSITY_SRCCUDA C SPH density kernel (compiled at runtime via NVRTC)

§Example (when cuda-backend feature enabled)

use oxiphysics_gpu::compute::cuda_backend::CudaBackend;

let mut backend = CudaBackend::try_new(0)?;          // device 0
let buf = backend.create_buffer(1024);               // 1024 f64 slots
backend.write_buffer(buf, &vec![1.0_f64; 1024]);
backend.launch("sph_density", &[buf], 16, 256);      // 16 blocks × 256 threads
let result = backend.read_buffer(buf);

Structs§

CudaBackend
CUDA compute backend.
CudaBufferHandle
Opaque handle to a CUDA device buffer allocated by CudaBackend.
CudaDeviceInfo
Information about the selected CUDA device.

Enums§

CudaInitError
Errors returned by CudaBackend::try_new.

Constants§

CUDA_SPH_DENSITY_SRC
CUDA C source for SPH density summation kernel, compiled at runtime via NVRTC.
PTX_CONSTRAINT_PGS
PTX source for block-PGS constraint solving.
PTX_PARALLEL_SCAN
PTX source for Blelloch exclusive parallel prefix scan.
PTX_SPH_DENSITY
PTX source for SPH density summation with cubic-spline W3 kernel.