pub struct ResidentDesignGram { /* private fields */ }Expand description
#1017 Phase 3: a device-resident design matrix for repeated Xᵀ·diag(w)·X
Gram evaluations that uploads X to the device ONCE.
The per-call try_fast_xt_diag_x re-uploads the full n×p X on every
call. The SAE / IRLS inner loop holds X fixed and rebuilds the Gram once
per Newton/PIRLS weight update, so the repeated H2D of X is pure waste —
measured on an A100 (#1412) it makes the XtWX GEMM ~98% of the pipeline at
<20% device utilisation (the device is starved by staging, not arithmetic).
This handle uploads X once at construction; each Self::gram crosses
only the n-vector w H2D and the p×p Gram D2H, so the per-Gram transfer
shrinks by a factor of p.
Admission keys on the same work-based DispatchOp::XtDiagX gate as the
per-call path (so it engages exactly when the Gram is GPU-profitable) and the
numerics are bit-identical to try_fast_xt_diag_x on the same device
(same cublasDdgmm row-scale + gemm reduction order). On a non-CUDA host,
a below-threshold shape, or any device failure, Self::try_new returns
None and the caller keeps its CPU/per-call path — residency never changes
the result, only where (and how often) X is staged.
Implementations§
Source§impl ResidentDesignGram
impl ResidentDesignGram
Sourcepub fn try_new(x: ArrayView2<'_, f64>) -> Option<Self>
pub fn try_new(x: ArrayView2<'_, f64>) -> Option<Self>
Upload x (n×p) to the device once. Returns None when CUDA is
unavailable, the shape is below the GPU Gram threshold, or the upload
fails.
Sourcepub fn gram(&self, w: ArrayView1<'_, f64>) -> Option<Array2<f64>>
pub fn gram(&self, w: ArrayView1<'_, f64>) -> Option<Array2<f64>>
Compute Xᵀ·diag(w)·X reusing the resident X. w must have one entry
per design row. Returns None on a shape mismatch or device failure.
Sourcepub fn solve_normal_equations(
&self,
w: ArrayView1<'_, f64>,
rhs: ArrayView1<'_, f64>,
ridge: f64,
) -> Option<Array1<f64>>
pub fn solve_normal_equations( &self, w: ArrayView1<'_, f64>, rhs: ArrayView1<'_, f64>, ridge: f64, ) -> Option<Array1<f64>>
Solve the penalized normal equations (Xᵀ·diag(w)·X + ridge·I)·β = rhs
with the Gram, its Cholesky factor, and the RHS all kept DEVICE-RESIDENT —
only w (n), rhs (p), and the solution β (p) cross the bus.
This is the #1017 Phase-3 fix for the next ceiling after Self::gram:
the bare Gram still pays a p×p D2H (134 MB at p=4096), but the SAE/IRLS
inner step only needs β, so chaining row-scale→GEMM→POTRF→TRSM on-device
and returning only the p-vector removes that transfer entirely. Returns
None on a shape mismatch, a non-PD Gram, or any device failure — the
caller then runs the CPU normal-equations solve. The numerics match a
host Cholesky((XᵀWX+ridge·I)) solve up to IEEE-754 reduction order.
Auto Trait Implementations§
impl Freeze for ResidentDesignGram
impl RefUnwindSafe for ResidentDesignGram
impl Send for ResidentDesignGram
impl Sync for ResidentDesignGram
impl Unpin for ResidentDesignGram
impl UnsafeUnpin for ResidentDesignGram
impl UnwindSafe for ResidentDesignGram
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more