pub fn softmax_kernel_source(k: usize, p: usize) -> StringExpand description
Prepend the KK / PP macros so the NVRTC compile is a pure compile_ptx,
matching sphere_gpu / arrow_schur_nvrtc.
Also prepend an INFINITY definition: the kernel seeds its softmax max
reduction with -INFINITY, but NVRTC does NOT predefine INFINITY (it is a
<math.h> macro, not a CUDA builtin), so without this the whole module
fails to compile and the SAE row-jet path silently falls back to the CPU
(same genus as the M_PI NVRTC fix). __longlong_as_double is an
always-available NVRTC builtin needing no header.