Skip to main content

ResidentDesignGram

Struct ResidentDesignGram 

Source
pub struct ResidentDesignGram { /* private fields */ }
Expand description

#1017 Phase 3: a device-resident design matrix for repeated Xᵀ·diag(w)·X Gram evaluations that uploads X to the device ONCE.

The per-call try_fast_xt_diag_x re-uploads the full n×p X on every call. The SAE / IRLS inner loop holds X fixed and rebuilds the Gram once per Newton/PIRLS weight update, so the repeated H2D of X is pure waste — measured on an A100 (#1412) it makes the XtWX GEMM ~98% of the pipeline at <20% device utilisation (the device is starved by staging, not arithmetic). This handle uploads X once at construction; each Self::gram crosses only the n-vector w H2D and the p×p Gram D2H, so the per-Gram transfer shrinks by a factor of p.

Admission keys on the same work-based DispatchOp::XtDiagX gate as the per-call path (so it engages exactly when the Gram is GPU-profitable) and the numerics are bit-identical to try_fast_xt_diag_x on the same device (same cublasDdgmm row-scale + gemm reduction order). On a non-CUDA host, a below-threshold shape, or any device failure, Self::try_new returns None and the caller keeps its CPU/per-call path — residency never changes the result, only where (and how often) X is staged.

Implementations§

Source§

impl ResidentDesignGram

Source

pub fn try_new(x: ArrayView2<'_, f64>) -> Option<Self>

Upload x (n×p) to the device once. Returns None when CUDA is unavailable, the shape is below the GPU Gram threshold, or the upload fails.

Source

pub fn gram(&self, w: ArrayView1<'_, f64>) -> Option<Array2<f64>>

Compute Xᵀ·diag(w)·X reusing the resident X. w must have one entry per design row. Returns None on a shape mismatch or device failure.

Source

pub fn solve_normal_equations( &self, w: ArrayView1<'_, f64>, rhs: ArrayView1<'_, f64>, ridge: f64, ) -> Option<Array1<f64>>

Solve the penalized normal equations (Xᵀ·diag(w)·X + ridge·I)·β = rhs with the Gram, its Cholesky factor, and the RHS all kept DEVICE-RESIDENT — only w (n), rhs (p), and the solution β (p) cross the bus.

This is the #1017 Phase-3 fix for the next ceiling after Self::gram: the bare Gram still pays a p×p D2H (134 MB at p=4096), but the SAE/IRLS inner step only needs β, so chaining row-scale→GEMM→POTRF→TRSM on-device and returning only the p-vector removes that transfer entirely. Returns None on a shape mismatch, a non-PD Gram, or any device failure — the caller then runs the CPU normal-equations solve. The numerics match a host Cholesky((XᵀWX+ridge·I)) solve up to IEEE-754 reduction order.

Source

pub fn dims(&self) -> (usize, usize)

(n, p) of the resident design.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V