Module sphere_gpu

Expand description

GPU NVRTC Wahba intrinsic-S2 kernel matrix construction.

This module owns the device-side construction of the Wahba reproducing kernel basis matrix on the 2-sphere using the finite truncated spectral Legendre series

K_L(γ) = Σ_{ℓ=1..L} c_ℓ · P_ℓ(cos γ),

evaluated entry-by-entry against the 3-term Legendre recurrence kept in registers. The host CPU parity target is the matching SphereWahbaKernel::SobolevTruncated { lmax } / SphereWahbaKernel::PseudoTruncated { lmax } variant added to src/terms/basis.rs (single source: same recurrence, same c_ℓ).

The device path evaluates the raw column-major kernel matrix with f64 Legendre recurrence math. Host code owns centering, constraints, and solver assembly in basis.rs.

Structs§

DeviceS2KernelMatrix: Device-resident (rows × cols) matrix in column-major layout with leading dimension ld ≥ rows. The slice holds ld * cols f64 elements; entry (i, j) lives at col_major_dev[j * ld + i].
PenalisedLsSolution: Result returned by solve_penalised_ls_device.
S2KernelBuildInputs: Host-side inputs needed to launch s2_wahba_legendre_colmajor.
S2ModuleCacheKey: Module cache key: every distinct (CC, LMAX, kind, layout, kernel flavor) compiles to a different PTX. precision = f64 and the (32, 8, 1) raw-kernel block / (128, 1, 1) Householder-kernel block shapes are baked into the kernel source so they are implicit in the flavor tag and don’t appear here.
SphereGpuBackend: Process-wide sphere GPU backend. Lazy-initialised on first call to SphereGpuBackend::probe.

Enums§

DeviceMatrixLayout: Layout of the (n,m) kernel design matrix on device. The Wahba pipeline downstream of this kernel (cuBLAS GEMM, cuSOLVER GEQRF) requires column-major.
SphereSpectralKernelKind: Which truncated-spectral Wahba kernel to evaluate on device. Matches the CPU SphereWahbaKernel::{SobolevTruncated, PseudoTruncated} so parity tests are well-defined.

Functions§

build_center_kernel_device: Build the (m × m) center-center kernel matrix C using the same GPU kernel that builds the design. centers_xyz is the unit-vector representation of the centers, length 3 * m. coeffs and kind match the design build.
build_householder_constrained_design_device: Phase-3 fused Householder-constrained kernel. v is the Householder vector (length m), beta the reflector scalar, and the output is the (n × (m-1)) constrained design X_s on device.
build_kernel_matrix_device: Build the raw (n × m) Wahba kernel matrix on device using s2_wahba_legendre_colmajor. Phase 1 entry point.
constrained_penalty_host: Constrained penalty matrix S = Zᵀ C Z for the weighted-sum-to-zero Householder constraint built from w. Returned shape is ((m−1) × (m−1)). C is taken as a host (m × m) array (typically the dtoh of build_center_kernel_device).
householder_reflector_from_weights: Build the Householder reflector that zeroes w against e_1. Returns (v, beta) with the LAPACK / Golub-Van Loan convention v[0] = 1. If w has zero norm, returns (0-vector, 0.0) and the caller should treat the reflector as a no-op (no constraint).
latlon_to_xyz_host: Lat/lon (degrees or radians) → unit vector (x, y, z) on S² ⊂ ℝ³. Returns a flat Vec<f64> of length 3 * n in the row-major layout [x_0, y_0, z_0, x_1, y_1, z_1, …], ready for one htod upload.
solve_penalised_ls_device: Augmented penalised least-squares solve via on-device cuSOLVER QR.
sphere_gpu_compiled: Returns true if this build was compiled with the Linux + cudarc GPU backend that runs the S² Wahba kernels.
sphere_kernel_decision: Decide whether the GPU sphere kernel matrix path is eligible for (n, m, lmax). Heuristic per the math spec: