pub fn compile_ptx_arch<S: AsRef<str>>(source: S) -> Result<Ptx, GpuError>Expand description
Compile a kernel source string to PTX with the SAME device-keyed NVRTC
options PtxModuleCache::get_or_compile uses — crucially the
--gpu-architecture pin (#1551), without which NVRTC defaults below
sm_60 and rejects atomicAdd(double*, double). Call sites that compile
via the bare cudarc::nvrtc::compile_ptx (no options) MUST route through
this instead when their kernel uses double atomics, or the device path
silently falls back to the CPU.