Expand description
Shared host-side scaffolding for every cudarc-backed module under
src/gpu/* and src/solver/gpu/*.
Before this module existed, each device backend (bms_flex,
survival_flex, polya_gamma, reml_trace, …) carried its own
near-identical copy of two patterns:
- A power-of-two bucketed free list of reusable f64 device slices
(the per-backend
DeviceArena). - A
OnceLock<Result<{module: Arc<CudaModule>}, GpuError>>that NVRTC-compiled one source string the first time the backend dispatched and cached the resulting module for the process lifetime.
Both are now provided here so every cudarc backend points at the same
implementation. The migration is atomic: no per-backend DeviceArena
type, no per-backend ad-hoc OnceLock, no transitional shim.
Structs§
- Device
Arena - Power-of-two bucketed free list of f64 device slices.
- PtxModule
Cache - Process-wide NVRTC module cache for a single PTX source string.
Functions§
- compile_
ptx_ arch - Compile a kernel source string to PTX with the SAME device-keyed NVRTC
options
PtxModuleCache::get_or_compileuses — crucially the--gpu-architecturepin (#1551), without which NVRTC defaults belowsm_60and rejectsatomicAdd(double*, double). Call sites that compile via the barecudarc::nvrtc::compile_ptx(no options) MUST route through this instead when their kernel uses double atomics, or the device path silently falls back to the CPU.