Expand description
cuSOLVER-backed dense solver kernels for the GPU HAL.
This module owns CUDA solver functionality that is shared by GPU linear algebra dispatch and higher-level solver code. CPU solves do not live behind these entry points: unavailable CUDA support is reported as an error.
Structs§
- Refinement
Outcome - Outcome reported by
iterative_refinement_cholesky_solve.
Functions§
- check_
deferred_ potrf_ info - Download the POTRF deferred info scalar and return an error if non-zero.
- check_
deferred_ potrs_ info - Download the POTRS deferred info scalar and return an error if non-zero.
- cholesky_
logdet_ from_ col_ major - cholesky_
lower_ gpu - cholesky_
solve_ gpu - cholesky_
solve_ only_ gpu - Solution-only mixed-precision solve: like
cholesky_solve_gpubut skips the redundant fp64 POTRF when the fp32 + refinement path succeeds, since the caller does not consume the log-determinant. This is the path that delivers the full mixed-precision speedup (expensive O(p³) factor stays fp32) for the PIRLS Newton direction solve, where the logdet is discarded. The solution is full fp64 accuracy via iterative refinement. - context_
and_ stream - iterative_
refinement_ cholesky_ solve - Solve
A x = bwith fp32 Cholesky factorization + fp64-residual iterative refinement, automatically falling back to fp64 when the policy rejects the attempt or when the fp32 path fails / diverges. - pinned_
htod - potrf_
in_ place - potrf_
in_ place_ reuse - POTRF factorization using pre-allocated workspace and info buffers.
- potrf_
query_ lwork - Query the cuSOLVER POTRF workspace size for a p×p matrix.
- potrs_
in_ place - potrs_
in_ place_ reuse - POTRS triangular solve using a pre-allocated info buffer.
- solver_
backend_ status