Module reml_gpu

Expand description

Exact GPU REML evidence + derivative gradient.

Refactor (Block 2.1, math team section 18): the penalized Hessian H is Cholesky-factored exactly once on device, the factor is held resident, and every derivative Hessian H_j is solved through the cached factor with a single batched potrs call (nrhs = d_rho · p). Previously each derivative re-issued the full cholesky_solve_gpu path, which uploaded H, allocated and ran potrf, and downloaded the factor again — turning a p^3 + d·p^3 workload into a (d+1)·p^3 one and serializing d_rho factor passes onto the device.

On the non-Linux fallback the same cholesky_solve_gpu path is exercised via pirls_gpu::cholesky_solve_gpu, so behaviour outside Linux is numerically identical (with the same per-derivative overhead) — the optimisation is Linux-only because that is where CUDA actually runs.

Structs§

RemlGpuEvidence
RemlGpuInput

Functions§

evidence_derivatives_gpu

Module reml_gpu

Module reml_gpu Copy item path

Structs§

Functions§

Module reml_gpu