Expand description
Shared CUDA backend-probe contract for every cudarc-backed module under
src/gpu/*.
Before this module existed, every GPU backend (bms_flex,
survival_flex, cubic_bspline_moments, cubic_cell, pirls_row,
sphere, …) carried its own near-identical probe_linux prologue:
- Fetch the process-wide [
GpuRuntime] or fail with aDriverLibraryUnavailable { reason: "<module> backend: no CUDA runtime available" }. - Read the runtime’s selected device ordinal.
- Create (or reuse) the per-ordinal [
CudaContext] or fail with aDriverCallFailed { reason: "<module> backend: failed to create CUDA context for device N" }. - Open the context’s default [
CudaStream]. - Carry the device’s compute capability alongside the handles.
Those five steps are identical apart from the per-module label that gets
woven into the two error messages. Drift between copies meant error
wording, capability handling, context reuse, and stream choice could
diverge module to module. This module hosts the single contract: each
backend now calls probe_cuda_backend with its label and keeps only
its module caches and optional eager-compilation step.
The migration is atomic: no backend re-implements the prologue, and there is no transitional shim.
Structs§
- Cuda
Backend Context - The process-wide device handles every cudarc backend stores after a
successful probe: the
CudaContext, its defaultCudaStream, the lazily NVRTC-compiledPtxModuleCache, and the bucketedDeviceArenaof reusable f64 device buffers (held under aMutexbecause large-scale fits dispatch from multiple rayon worker threads; the mutex is only held duringalloc/release, not across kernel launches). Module-specific backends (bms_flex,survival_flex, …) wrap one of these as theirinnercontext so the host-side scaffolding (arena pooling, module cache, mutex around alloc) is uniform instead of duplicated per backend. - Cuda
Backend Parts - The handles every cudarc backend shares once the probe succeeds: a context on the runtime’s selected device, that context’s default stream, and the device’s compute capability. Module-specific backends layer their own caches and optional eager compilation on top of these.
Functions§
- probe_
backend_ with_ compile - Probe the CUDA backend for
labeland run a backend-specific build step on the resolved handles. - probe_
cuda_ backend - Probe the process-wide CUDA backend for the calling module.