Skip to main content

Module device_cache

Module device_cache 

Source
Expand description

Shared host-side scaffolding for every cudarc-backed module under src/gpu/* and src/solver/gpu/*.

Before this module existed, each device backend (bms_flex, survival_flex, polya_gamma, reml_trace, …) carried its own near-identical copy of two patterns:

  1. A power-of-two bucketed free list of reusable f64 device slices (the per-backend DeviceArena).
  2. A OnceLock<Result<{module: Arc<CudaModule>}, GpuError>> that NVRTC-compiled one source string the first time the backend dispatched and cached the resulting module for the process lifetime.

Both are now provided here so every cudarc backend points at the same implementation. The migration is atomic: no per-backend DeviceArena type, no per-backend ad-hoc OnceLock, no transitional shim.

Structs§

DeviceArena
Power-of-two bucketed free list of f64 device slices.
PtxModuleCache
Process-wide NVRTC module cache for a single PTX source string.

Functions§

compile_ptx_arch
Compile a kernel source string to PTX with the SAME device-keyed NVRTC options PtxModuleCache::get_or_compile uses — crucially the --gpu-architecture pin (#1551), without which NVRTC defaults below sm_60 and rejects atomicAdd(double*, double). Call sites that compile via the bare cudarc::nvrtc::compile_ptx (no options) MUST route through this instead when their kernel uses double atomics, or the device path silently falls back to the CPU.