Expand description
GPU NVRTC Wahba intrinsic-S2 kernel matrix construction.
This module owns the device-side construction of the Wahba reproducing kernel basis matrix on the 2-sphere using the finite truncated spectral Legendre series
K_L(γ) = Σ_{ℓ=1..L} c_ℓ · P_ℓ(cos γ),
evaluated entry-by-entry against the 3-term Legendre recurrence kept
in registers. The host CPU parity target is the matching
SphereWahbaKernel::SobolevTruncated { lmax } /
SphereWahbaKernel::PseudoTruncated { lmax } variant added to
src/terms/basis.rs (single source: same recurrence, same c_ℓ).
The device path evaluates the raw column-major kernel matrix with f64
Legendre recurrence math. Host code owns centering, constraints, and solver
assembly in basis.rs.
Structs§
- Device
S2Kernel Matrix - Device-resident
(rows × cols)matrix in column-major layout with leading dimensionld ≥ rows. The slice holdsld * colsf64elements; entry(i, j)lives atcol_major_dev[j * ld + i]. - Penalised
LsSolution - Result returned by
solve_penalised_ls_device. - S2Kernel
Build Inputs - Host-side inputs needed to launch
s2_wahba_legendre_colmajor. - S2Module
Cache Key - Module cache key: every distinct
(CC, LMAX, kind, layout, kernel flavor)compiles to a different PTX.precision = f64and the (32, 8, 1) raw-kernel block / (128, 1, 1) Householder-kernel block shapes are baked into the kernel source so they are implicit in the flavor tag and don’t appear here. - Sphere
GpuBackend - Process-wide sphere GPU backend. Lazy-initialised on first call to
SphereGpuBackend::probe.
Enums§
- Device
Matrix Layout - Layout of the (n,m) kernel design matrix on device. The Wahba pipeline downstream of this kernel (cuBLAS GEMM, cuSOLVER GEQRF) requires column-major.
- Sphere
Spectral Kernel Kind - Which truncated-spectral Wahba kernel to evaluate on device. Matches
the CPU
SphereWahbaKernel::{SobolevTruncated, PseudoTruncated}so parity tests are well-defined.
Functions§
- build_
center_ kernel_ device - Build the (m × m) center-center kernel matrix
Cusing the same GPU kernel that builds the design.centers_xyzis the unit-vector representation of the centers, length3 * m.coeffsandkindmatch the design build. - build_
householder_ constrained_ design_ device - Phase-3 fused Householder-constrained kernel.
vis the Householder vector (length m),betathe reflector scalar, and the output is the(n × (m-1))constrained design X_s on device. - build_
kernel_ matrix_ device - Build the raw
(n × m)Wahba kernel matrix on device usings2_wahba_legendre_colmajor. Phase 1 entry point. - constrained_
penalty_ host - Constrained penalty matrix
S = Zᵀ C Zfor the weighted-sum-to-zero Householder constraint built fromw. Returned shape is((m−1) × (m−1)).Cis taken as a host (m × m) array (typically the dtoh ofbuild_center_kernel_device). - householder_
reflector_ from_ weights - Build the Householder reflector that zeroes
wagainste_1. Returns(v, beta)with the LAPACK / Golub-Van Loan conventionv[0] = 1. Ifwhas zero norm, returns(0-vector, 0.0)and the caller should treat the reflector as a no-op (no constraint). - latlon_
to_ xyz_ host - Lat/lon (degrees or radians) → unit vector
(x, y, z)on S² ⊂ ℝ³. Returns a flatVec<f64>of length3 * nin the row-major layout[x_0, y_0, z_0, x_1, y_1, z_1, …], ready for onehtodupload. - solve_
penalised_ ls_ device - Augmented penalised least-squares solve via on-device cuSOLVER QR.
- sphere_
gpu_ compiled - Returns
trueif this build was compiled with the Linux + cudarc GPU backend that runs the S² Wahba kernels. - sphere_
kernel_ decision - Decide whether the GPU sphere kernel matrix path is eligible for
(n, m, lmax). Heuristic per the math spec: