Module device_cache

Expand description

Shared host-side scaffolding for every cudarc-backed module under src/gpu/* and src/solver/gpu/*.

Before this module existed, each device backend (bms_flex, survival_flex, polya_gamma, reml_trace, …) carried its own near-identical copy of two patterns:

A power-of-two bucketed free list of reusable f64 device slices (the per-backend DeviceArena).
A OnceLock<Result<{module: Arc<CudaModule>}, GpuError>> that NVRTC-compiled one source string the first time the backend dispatched and cached the resulting module for the process lifetime.

Both are now provided here so every cudarc backend points at the same implementation. The migration is atomic: no per-backend DeviceArena type, no per-backend ad-hoc OnceLock, no transitional shim.

Structs§

DeviceArena: Power-of-two bucketed free list of f64 device slices.
PtxModuleCache: Process-wide NVRTC module cache for a single PTX source string.

Functions§

compile_ptx_arch: Compile a kernel source string to PTX with the SAME device-keyed NVRTC options PtxModuleCache::get_or_compile uses — crucially the --gpu-architecture pin (#1551), without which NVRTC defaults below sm_60 and rejects atomicAdd(double*, double). Call sites that compile via the bare cudarc::nvrtc::compile_ptx (no options) MUST route through this instead when their kernel uses double atomics, or the device path silently falls back to the CPU.

Module device_cache

Module device_cache Copy item path

Structs§

Functions§

Module device_cache