Expand description
Fully GPU-resident batched Arrow-Schur dense Cholesky solver.
Implements the square-root Schur form: each local block D_i = L_i L_i^T
is factored on device, u_i = L_i^{-1} g_i and Y_i = L_i^{-1} B_i are
formed by triangular solves, the reduced shared system
S_β = C + ρ_β I − Σ_i Y_i^T Y_i, r_β = -g_β + Σ_i Y_i^T u_i
is assembled on device, factored once, and the back-substitution
w_i = u_i + Y_i · δβ, L_i^T x_i = w_i, δt_i = -x_i
is run on device. Only the final (δt, δβ, log|H|) triple is downloaded.
The current caller (Arrow-Schur Newton step inside PIRLS) feeds uniform
local block size d and uniform shared width k, so the entire pipeline
is dispatched as a single p-group; per-p grouping for heterogenous blocks
is Layer D’s NVRTC fused-kernel concern and lives in this module’s
follow-up implementation rather than in policy plumbing.
On non-Linux builds the entire module degrades to a CPU-fallback shim.
Structs§
- Arrow
Schur GpuSolution - Outcome of a single Arrow-Schur Newton solve.
- Resident
Arrow Frame Handle - #1017 Phase 3: a device-resident Arrow-Schur frame whose constant Hessian
blocks (
D = H_tt,B = H_tβ, borderH_ββ) and their factors stay on the device across the inner Newton loop. Construct once per frozen gate/basis frame, then callResidentArrowFrameHandle::solve_gradientonce per iterate with the fresh residual gradient — only theO(n·d + p)gradient crosses to the device and onlyδcrosses back, in contrast tosolve_arrow_newton_stepwhich re-uploads and re-factors the full system every call. On a non-CUDA host construction returnsArrowSchurGpuFailure::Unavailable.
Enums§
- Arrow
Schur GpuFailure - Reason a device path declined to run; lets the host caller decide between
CPU fallback and per-row escalation.
RidgeBumpRequiredcarries the estimated diagonal bump needed to clear the failed pivot.
Functions§
- gpu_
schur_ matvec_ backend - Build a GPU-backed Schur matvec closure for CPU-driven PCG at K ≥ 5000.
- solve_
arrow_ newton_ step - Entry point: attempt the fully device-resident Arrow-Schur Newton solve.
Returns
Err(ArrowSchurGpuFailure::Unavailable)to indicate “device path declined, fall back to CPU” — never panics. - solve_
reduced_ beta_ pcg - Solve the reduced shared β-system
S·δβ = rfully on device with a Jacobi-preconditioned conjugate-gradient (Steihaug truncated-CG) loop. - solve_
sae_ matrix_ free_ pcg