Skip to main content

Module arrow_schur

Module arrow_schur 

Source
Expand description

Fully GPU-resident batched Arrow-Schur dense Cholesky solver.

Implements the square-root Schur form: each local block D_i = L_i L_i^T is factored on device, u_i = L_i^{-1} g_i and Y_i = L_i^{-1} B_i are formed by triangular solves, the reduced shared system S_β = C + ρ_β I − Σ_i Y_i^T Y_i, r_β = -g_β + Σ_i Y_i^T u_i is assembled on device, factored once, and the back-substitution w_i = u_i + Y_i · δβ, L_i^T x_i = w_i, δt_i = -x_i is run on device. Only the final (δt, δβ, log|H|) triple is downloaded.

The current caller (Arrow-Schur Newton step inside PIRLS) feeds uniform local block size d and uniform shared width k, so the entire pipeline is dispatched as a single p-group; per-p grouping for heterogenous blocks is Layer D’s NVRTC fused-kernel concern and lives in this module’s follow-up implementation rather than in policy plumbing.

On non-Linux builds the entire module degrades to a CPU-fallback shim.

Structs§

ArrowSchurGpuSolution
Outcome of a single Arrow-Schur Newton solve.
ResidentArrowFrameHandle
#1017 Phase 3: a device-resident Arrow-Schur frame whose constant Hessian blocks (D = H_tt, B = H_tβ, border H_ββ) and their factors stay on the device across the inner Newton loop. Construct once per frozen gate/basis frame, then call ResidentArrowFrameHandle::solve_gradient once per iterate with the fresh residual gradient — only the O(n·d + p) gradient crosses to the device and only δ crosses back, in contrast to solve_arrow_newton_step which re-uploads and re-factors the full system every call. On a non-CUDA host construction returns ArrowSchurGpuFailure::Unavailable.

Enums§

ArrowSchurGpuFailure
Reason a device path declined to run; lets the host caller decide between CPU fallback and per-row escalation. RidgeBumpRequired carries the estimated diagonal bump needed to clear the failed pivot.

Functions§

gpu_schur_matvec_backend
Build a GPU-backed Schur matvec closure for CPU-driven PCG at K ≥ 5000.
solve_arrow_newton_step
Entry point: attempt the fully device-resident Arrow-Schur Newton solve. Returns Err(ArrowSchurGpuFailure::Unavailable) to indicate “device path declined, fall back to CPU” — never panics.
solve_reduced_beta_pcg
Solve the reduced shared β-system S·δβ = r fully on device with a Jacobi-preconditioned conjugate-gradient (Steihaug truncated-CG) loop.
solve_sae_matrix_free_pcg