Skip to main content

ImplicitHyperOperator

Struct ImplicitHyperOperator 

Source
pub struct ImplicitHyperOperator {
    pub implicit_deriv: Arc<ImplicitDesignPsiDerivative>,
    pub axis: usize,
    pub s_psi: Array2<f64>,
    pub c_x_psi_beta: Option<Arc<Array1<f64>>>,
    /* private fields */
}

Fields§

§implicit_deriv: Arc<ImplicitDesignPsiDerivative>

The implicit design-derivative operator (shared across all axes).

§axis: usize

Which axis this operator is for.

§s_psi: Array2<f64>

Penalty derivative matrix S_{ψ_d} (p × p), dense.

§c_x_psi_beta: Option<Arc<Array1<f64>>>

Non-Gaussian fixed-β third-derivative correction: c ⊙ (X_{ψ_d} β̂), length n. When present, the operator additionally applies Xᵀ diag(c_x_psi_beta) X v so that the full B_d formula B_d v = (∂X/∂ψ_d)ᵀ W X v + Xᵀ W (∂X/∂ψ_d) v + Xᵀ diag(c ⊙ X_{ψ_d} β̂) X v + S_{ψ_d} v is matrix-free for non-Gaussian likelihoods. None for Gaussian identity (c ≡ 0 there).

Implementations§

Source§

impl ImplicitHyperOperator

Source

pub fn bilinear_with_shared_x( &self, x_vec: &Array1<f64>, y_vec: &Array1<f64>, z: &Array1<f64>, u: &Array1<f64>, ) -> f64

Compute the design-part bilinear form u^T (X^T C_d X) z using precomputed shared X-multiplies, avoiding the full B_d matvec.

The design part of B_d is: (∂X/∂ψ_d)^T W X + X^T W (∂X/∂ψ_d)

For vectors z and u, the bilinear form u^T [design_part] z equals: ((∂X/∂ψ_d) u)^T (W (Xz)) + (Xu)^T (W ((∂X/∂ψ_d) z)) = 2 * (w ⊙ y_vec)^T dx_z [when u = u, z = z]

where y_vec = X u, dx_z = (∂X/∂ψ_d) z.

But the full bilinear form is NOT symmetric in its dependence on z vs u through the design derivative, so we compute both cross-terms: dx_z^T (w ⊙ y_vec) + dx_u^T (w ⊙ x_vec)

§Arguments
  • x_vec: X z (precomputed, shared across axes)
  • y_vec: X u (precomputed, shared across axes)
  • z: the probe vector (needed for forward_mul and penalty)
  • u: H⁻¹ z (needed for forward_mul and penalty)
§Returns

The full bilinear form u^T B_d z = design_part + penalty_part.

Source

pub fn matvec_with_shared_xz_into( &self, x_vec: ArrayView1<'_, f64>, z: ArrayView1<'_, f64>, out: ArrayViewMut1<'_, f64>, n_work: ArrayViewMut1<'_, f64>, p_work: ArrayViewMut1<'_, f64>, )

Compute the design-part contribution to A_d z without the X^T step.

Returns the n-vector C_d (X z) where C_d encodes the diagonal weighting. Specifically: (∂X/∂ψ_d)^T maps FROM n-space, but for stochastic trace estimation we need q_d = A_d z = X^T (C_d x_vec) + P_d z.

This method computes q_d = A_d z using the shared x_vec = X z: q_d = (∂X/∂ψ_d)^T (W (X z)) + X^T (W ((∂X/∂ψ_d) z)) + S_psi z which is the standard mul_vec but we can share x_vec across axes.

Trait Implementations§

Source§

impl HyperOperator for ImplicitHyperOperator

Source§

fn trace_projected_factor(&self, factor: &Array2<f64>) -> f64

Compute tr(F^T B F) directly via fused chunked BLAS3 GEMMs on the shared X and the shared raw kernel matrix, bypassing the rank-many separate matvecs the default impl would run through the lazy / operator-backed design.

Why this matters: the default trait impl is let bf = self.mul_mat(F); (F ⊙ bf).sum() which calls mul_vec_into per column of F (rank columns). On a lazy Duchon / Matérn / CTN design each mul_vec_into triggers a full O(n · p · kernel_eval) row-streamed matvec — and with rank ≈ p at large-scale shape (16D-Duchon-aniso 32 ψ-axes, p ≈ 95, n = 320 K) the per-axis trace landed at ~30 s. With 32 axes per outer Hessian eval and ~5 outer iters that’s the ~1 hr large-scale timeout.

Algebra:

  B_d = D_d^T W X + X^T W D_d  + X^T diag(c) X  + S_psi
  D_d = (∂X/∂ψ_d) = K_d · Z_unproject       (raw kernel · unproject)
  tr(F^T B_d F) = 2 · ⟨W ⊙ DXF, XF⟩ + ⟨c ⊙ XF, XF⟩ + tr(F^T S_psi F)

where K_d is the raw (n × n_knots) per-pair kernel scalar matrix for axis d (q · s_combo + c · coeff_sum · φ per (i, j) pair) and Z_unproject is the identifiability/padding back-projection.

We compute U_knot = unproject_matrix(F) once at (n_knots × rank), then for each row chunk do a fused pass:

  • XF_chunk = X_chunk · F (chunk × rank) — shared-X GEMM
  • Kd_chunk = row_chunk_first_raw(chunk × n_knots) — raw kernel
  • DXF_chunk = Kd_chunk · U_knot (chunk × rank) — single GEMM and immediately accumulate ⟨W ⊙ DXF, XF⟩ and ⟨c ⊙ XF, XF⟩ over the chunk, never materialising full XF or DXF.

This replaces the previous rank-many forward_mul apply loop. On the large-scale margslope-aniso-duchon16d shard each per-axis trace drops from ~30 s to a single chunked-GEMM cost.

Source§

fn trace_projected_factor_cached( &self, factor: &Array2<f64>, cache: &ProjectedFactorCache, ) -> f64

Cached variant — the hot-path optimisation for large-scale outer gradient/Hessian sweeps. Every ψ-axis built atop the same x_design (e.g. all 32 ψ-axes of a marginal-slope model, or the same axis hit from g_factor and w_factor traces) shares one chunked X · F design GEMM per (x_design, factor) pair via ProjectedFactorCache. With 32 axes per outer-gradient sweep and O(rank) more cross-axis traces inside the outer-Hessian build, the cache turns 32× redundant O(n · p · rank) GEMMs into a single one per outer iter. At large-scale shape (n = 320 K, p = rank = 95) that is the difference between minutes and seconds of design-GEMM work.

Source§

fn dim(&self) -> usize

Operator dimension p such that B · v consumes a p-vector and produces a p-vector.
Source§

fn mul_vec(&self, v: &Array1<f64>) -> Array1<f64>

Compute B · v (matrix-vector product). v and result are p-vectors.
Source§

fn mul_vec_view(&self, v: ArrayView1<'_, f64>) -> Array1<f64>

Compute B · v from a vector view.
Source§

fn mul_vec_into(&self, v: ArrayView1<'_, f64>, out: ArrayViewMut1<'_, f64>)

Compute B · v into caller-owned storage.
Source§

fn mul_basis_columns_into(&self, start: usize, out: ArrayViewMut2<'_, f64>)

Fill columns [start, start + out.ncols()) of B into out.
Source§

fn bilinear(&self, v: &Array1<f64>, u: &Array1<f64>) -> f64

Compute v^T · B · u (bilinear form).
Source§

fn bilinear_view(&self, v: ArrayView1<'_, f64>, u: ArrayView1<'_, f64>) -> f64

Compute v^T · B · u without requiring owned vector inputs.
Source§

fn is_implicit(&self) -> bool

Whether this operator uses implicit (non-materialized) storage.
Source§

fn as_any(&self) -> &(dyn Any + 'static)

Expose the concrete type for solver-local downcast helpers when the implementor has a 'static concrete type. Borrowing adapters may keep the default, which simply cannot downcast.
Source§

fn mul_mat( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

Compute B · F where F is (p × k). Default dispatches per-column in parallel unless already inside a rayon worker.
Source§

fn projection_design_id(&self) -> Option<usize>

Optional stable identity for this operator’s action B. When Some, the default cached trace / projected-matrix paths memoize the B · F product in the shared ProjectedFactorCache under a (design_id, factor) key, so repeated projections of the same factor against the same operator within one outer iteration build B · F once. None (the default) disables that reuse: an operator with no design factor stable across calls cannot key the cache without risking a stale B · F, so it recomputes every time.
Source§

fn projected_matrix( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

Compute the exact projected matrix F^T B F.
Source§

fn projected_matrix_cached( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, factor_cache: &ProjectedFactorCache, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

Compute the exact projected matrix F^T B F, reusing caller-owned projection caches when the operator has a shared row/design factor.
Source§

fn scaled_add_mul_vec( &self, v: ArrayBase<ViewRepr<&f64>, Dim<[usize; 1]>>, scale: f64, out: ArrayBase<ViewRepr<&mut f64>, Dim<[usize; 1]>>, )

Accumulate scale * B · v into caller-owned storage.
Source§

fn has_fast_bilinear_view(&self) -> bool

Whether bilinear_view is implemented as a direct scalar contraction.
Source§

fn to_dense(&self) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

Full dense materialization.
Source§

fn block_local_data( &self, ) -> Option<(&ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, usize, usize)>

If this operator is block-local, returns the block range and local matrix.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V