pub struct ImplicitHyperOperator {
pub implicit_deriv: Arc<ImplicitDesignPsiDerivative>,
pub axis: usize,
pub s_psi: Array2<f64>,
pub c_x_psi_beta: Option<Arc<Array1<f64>>>,
/* private fields */
}Fields§
§implicit_deriv: Arc<ImplicitDesignPsiDerivative>The implicit design-derivative operator (shared across all axes).
axis: usizeWhich axis this operator is for.
s_psi: Array2<f64>Penalty derivative matrix S_{ψ_d} (p × p), dense.
c_x_psi_beta: Option<Arc<Array1<f64>>>Non-Gaussian fixed-β third-derivative correction: c ⊙ (X_{ψ_d} β̂),
length n. When present, the operator additionally applies
Xᵀ diag(c_x_psi_beta) X v so that the full B_d formula
B_d v = (∂X/∂ψ_d)ᵀ W X v + Xᵀ W (∂X/∂ψ_d) v + Xᵀ diag(c ⊙ X_{ψ_d} β̂) X v + S_{ψ_d} v
is matrix-free for non-Gaussian likelihoods. None for Gaussian
identity (c ≡ 0 there).
Implementations§
Source§impl ImplicitHyperOperator
impl ImplicitHyperOperator
Compute the design-part bilinear form u^T (X^T C_d X) z using precomputed shared X-multiplies, avoiding the full B_d matvec.
The design part of B_d is: (∂X/∂ψ_d)^T W X + X^T W (∂X/∂ψ_d)
For vectors z and u, the bilinear form u^T [design_part] z equals: ((∂X/∂ψ_d) u)^T (W (Xz)) + (Xu)^T (W ((∂X/∂ψ_d) z)) = 2 * (w ⊙ y_vec)^T dx_z [when u = u, z = z]
where y_vec = X u, dx_z = (∂X/∂ψ_d) z.
But the full bilinear form is NOT symmetric in its dependence on z vs u through the design derivative, so we compute both cross-terms: dx_z^T (w ⊙ y_vec) + dx_u^T (w ⊙ x_vec)
§Arguments
x_vec: X z (precomputed, shared across axes)y_vec: X u (precomputed, shared across axes)z: the probe vector (needed for forward_mul and penalty)u: H⁻¹ z (needed for forward_mul and penalty)
§Returns
The full bilinear form u^T B_d z = design_part + penalty_part.
Compute the design-part contribution to A_d z without the X^T step.
Returns the n-vector C_d (X z) where C_d encodes the diagonal weighting. Specifically: (∂X/∂ψ_d)^T maps FROM n-space, but for stochastic trace estimation we need q_d = A_d z = X^T (C_d x_vec) + P_d z.
This method computes q_d = A_d z using the shared x_vec = X z: q_d = (∂X/∂ψ_d)^T (W (X z)) + X^T (W ((∂X/∂ψ_d) z)) + S_psi z which is the standard mul_vec but we can share x_vec across axes.
Trait Implementations§
Source§impl HyperOperator for ImplicitHyperOperator
impl HyperOperator for ImplicitHyperOperator
Source§fn trace_projected_factor(&self, factor: &Array2<f64>) -> f64
fn trace_projected_factor(&self, factor: &Array2<f64>) -> f64
Compute tr(F^T B F) directly via fused chunked BLAS3 GEMMs on the
shared X and the shared raw kernel matrix, bypassing the rank-many
separate matvecs the default impl would run through the lazy /
operator-backed design.
Why this matters: the default trait impl is
let bf = self.mul_mat(F); (F ⊙ bf).sum()
which calls mul_vec_into per column of F (rank columns). On a
lazy Duchon / Matérn / CTN design each mul_vec_into triggers a
full O(n · p · kernel_eval) row-streamed matvec — and with rank ≈ p
at large-scale shape (16D-Duchon-aniso 32 ψ-axes, p ≈ 95, n = 320 K)
the per-axis trace landed at ~30 s. With 32 axes per outer Hessian
eval and ~5 outer iters that’s the ~1 hr large-scale timeout.
Algebra:
B_d = D_d^T W X + X^T W D_d + X^T diag(c) X + S_psi
D_d = (∂X/∂ψ_d) = K_d · Z_unproject (raw kernel · unproject)
tr(F^T B_d F) = 2 · ⟨W ⊙ DXF, XF⟩ + ⟨c ⊙ XF, XF⟩ + tr(F^T S_psi F)where K_d is the raw (n × n_knots) per-pair kernel scalar matrix
for axis d (q · s_combo + c · coeff_sum · φ per (i, j) pair) and
Z_unproject is the identifiability/padding back-projection.
We compute U_knot = unproject_matrix(F) once at (n_knots × rank),
then for each row chunk do a fused pass:
XF_chunk = X_chunk · F(chunk × rank) — shared-X GEMMKd_chunk = row_chunk_first_raw(chunk × n_knots) — raw kernelDXF_chunk = Kd_chunk · U_knot(chunk × rank) — single GEMM and immediately accumulate⟨W ⊙ DXF, XF⟩and⟨c ⊙ XF, XF⟩over the chunk, never materialising full XF or DXF.
This replaces the previous rank-many forward_mul apply loop. On
the large-scale margslope-aniso-duchon16d shard each per-axis trace
drops from ~30 s to a single chunked-GEMM cost.
Source§fn trace_projected_factor_cached(
&self,
factor: &Array2<f64>,
cache: &ProjectedFactorCache,
) -> f64
fn trace_projected_factor_cached( &self, factor: &Array2<f64>, cache: &ProjectedFactorCache, ) -> f64
Cached variant — the hot-path optimisation for large-scale outer
gradient/Hessian sweeps. Every ψ-axis built atop the same x_design
(e.g. all 32 ψ-axes of a marginal-slope model, or the same axis hit
from g_factor and w_factor traces) shares one chunked
X · F design GEMM per (x_design, factor) pair via
ProjectedFactorCache. With 32 axes per outer-gradient sweep and
O(rank) more cross-axis traces inside the outer-Hessian build, the
cache turns 32× redundant O(n · p · rank) GEMMs into a single one
per outer iter. At large-scale shape (n = 320 K, p = rank = 95) that
is the difference between minutes and seconds of design-GEMM work.
Source§fn dim(&self) -> usize
fn dim(&self) -> usize
p such that B · v consumes a p-vector and
produces a p-vector.Source§fn mul_vec(&self, v: &Array1<f64>) -> Array1<f64>
fn mul_vec(&self, v: &Array1<f64>) -> Array1<f64>
Source§fn mul_vec_view(&self, v: ArrayView1<'_, f64>) -> Array1<f64>
fn mul_vec_view(&self, v: ArrayView1<'_, f64>) -> Array1<f64>
Source§fn mul_vec_into(&self, v: ArrayView1<'_, f64>, out: ArrayViewMut1<'_, f64>)
fn mul_vec_into(&self, v: ArrayView1<'_, f64>, out: ArrayViewMut1<'_, f64>)
Source§fn mul_basis_columns_into(&self, start: usize, out: ArrayViewMut2<'_, f64>)
fn mul_basis_columns_into(&self, start: usize, out: ArrayViewMut2<'_, f64>)
[start, start + out.ncols()) of B into out.Source§fn bilinear(&self, v: &Array1<f64>, u: &Array1<f64>) -> f64
fn bilinear(&self, v: &Array1<f64>, u: &Array1<f64>) -> f64
Source§fn bilinear_view(&self, v: ArrayView1<'_, f64>, u: ArrayView1<'_, f64>) -> f64
fn bilinear_view(&self, v: ArrayView1<'_, f64>, u: ArrayView1<'_, f64>) -> f64
Source§fn is_implicit(&self) -> bool
fn is_implicit(&self) -> bool
Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
'static concrete type. Borrowing adapters may keep
the default, which simply cannot downcast.Source§fn mul_mat(
&self,
factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>,
) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
fn mul_mat( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
Source§fn projection_design_id(&self) -> Option<usize>
fn projection_design_id(&self) -> Option<usize>
B. When Some,
the default cached trace / projected-matrix paths memoize the B · F
product in the shared ProjectedFactorCache under a
(design_id, factor) key, so repeated projections of the same factor
against the same operator within one outer iteration build B · F
once. None (the default) disables that reuse: an operator with no
design factor stable across calls cannot key the cache without risking
a stale B · F, so it recomputes every time.Source§fn projected_matrix(
&self,
factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>,
) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
fn projected_matrix( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
F^T B F.Source§fn projected_matrix_cached(
&self,
factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>,
factor_cache: &ProjectedFactorCache,
) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
fn projected_matrix_cached( &self, factor: &ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>, factor_cache: &ProjectedFactorCache, ) -> ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>
F^T B F, reusing caller-owned
projection caches when the operator has a shared row/design factor.Source§fn scaled_add_mul_vec(
&self,
v: ArrayBase<ViewRepr<&f64>, Dim<[usize; 1]>>,
scale: f64,
out: ArrayBase<ViewRepr<&mut f64>, Dim<[usize; 1]>>,
)
fn scaled_add_mul_vec( &self, v: ArrayBase<ViewRepr<&f64>, Dim<[usize; 1]>>, scale: f64, out: ArrayBase<ViewRepr<&mut f64>, Dim<[usize; 1]>>, )
scale * B · v into caller-owned storage.Source§fn has_fast_bilinear_view(&self) -> bool
fn has_fast_bilinear_view(&self) -> bool
bilinear_view is implemented as a direct scalar contraction.Auto Trait Implementations§
impl !RefUnwindSafe for ImplicitHyperOperator
impl !UnwindSafe for ImplicitHyperOperator
impl Freeze for ImplicitHyperOperator
impl Send for ImplicitHyperOperator
impl Sync for ImplicitHyperOperator
impl Unpin for ImplicitHyperOperator
impl UnsafeUnpin for ImplicitHyperOperator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.