Skip to main content

try_fast_spectral_leverage_diagonal

Function try_fast_spectral_leverage_diagonal 

Source
pub fn try_fast_spectral_leverage_diagonal(
    x: &DesignMatrix,
    g: ArrayView2<'_, f64>,
) -> Option<Array1<f64>>
Expand description

GPU-offloaded spectral leverage diagonal h[i] = ‖(X G)_{i,:}‖².

G is the (p × rank) spectral factor with G_ε(H) = G Gᵀ; the per-row leverage is the squared norm of the i-th row of X G. This is the dominant n-dependent cost of every REML outer evaluation at large scale (issue #922), and historically ran only on the CPU while the device pool idled.

The row dimension is split into byte-balanced chunks scattered across the whole device pool via super::pool::scatter_batched — the same whole-solve row-block granularity as Arrow-Schur — and each tile runs one cuBLAS GEMM X_chunk · G on its bound ordinal before reducing row-wise sum-of-squares. The arithmetic is identical f64 to the CPU faer path (modulo IEEE-754 reduction order); on no device, a below-threshold shape, or any tile failure the function returns None and the caller runs its deterministic CPU stream.