Struct IsometryPenalty

Source

pub struct IsometryPenalty {
    pub target: PsiSlice,
    pub reference: IsometryReference,
    pub rho_index: usize,
    pub jacobian_cache_slot: RwLock<Option<Arc<Array2<f64>>>>,
    pub jacobian_second_cache_slot: RwLock<Option<Arc<Array2<f64>>>>,
    pub duchon_radial_source: Option<Arc<IsometryDuchonRadialSource>>,
    pub third_decoder_derivative_slot: RwLock<Option<Arc<Array3<f64>>>>,
    pub p_out: usize,
    pub weight: WeightField,
    pub scalar_weight: f64,
    pub weight_schedule: Option<ScalarWeightSchedule>,
}

Expand description

Isometry-to-reference penalty (canonical-coordinate gauge term).

Lives on ext-coords: the target slice is a row of the LatentCoordValues flat vector (row-major n_obs × d). Owns one ρ-axis (log μ_iso).

Penalizes ½ μ Σ_n ‖g_n(t) − g^ref(t_n)‖²_F, where the pullback metric at row n is

  g_n = J_n^T W_n J_n,    J_n ∈ ℝ^{p × d}

and W_n is a per-row low-rank PSD behavioral metric stored as W_n = U_n U_n^T with U_n ∈ ℝ^{p × r}. The canonical-coordinate statement is “one unit of motion in t ↦ one unit of behavioral change”, so the W_n weighting is load-bearing.

In the SAE objective this is the extension-coordinate gauge fix: it prevents the latent chart from absorbing arbitrary smooth reparameterizations of the decoder manifold. ARD, sparsity, or rank penalties can then select axes or structure in a chart whose metric scale is pinned.

Contraction order invariant. Every place this struct touches W_n, the contraction is (J^T U_n)(U_n^T J) — never J^T W_n J with W_n materialized as p × p. Concretely we form M_n = U_n^T J_n ∈ ℝ^{r × d} once and then g_n = M_n^T M_n (d × d). Cost per row: O(p · r · d + r · d²), independent of p².

When to use. Whenever a LatentCoord block is in play without an auxiliary variable (AuxPrior) to break the diffeomorphism gauge. Fixes the audit finding that ARD is not a standalone gauge fix. With a Euclidean reference, the penalty pulls the decoder toward a local isometry, which is enough to make the inner Hessian on t full-rank and the IFT well-defined.

Math. Let J_n ∈ ℝ^{p × d} be the local decoder Jacobian. Then g_n = J_n^T W_n J_n and the penalty is ½ μ Σ_n ‖J_n^T W_n J_n − g^ref_n‖²_F. Analytic gradient w.r.t. t_n:

  ∂P/∂t_{n,c}
    = μ Σ_{a,b} (g_n − g^ref_n)_{ab}
        [ H_{n,:,a,c}^T W_n J_{n,:,b}
          + J_{n,:,a}^T W_n H_{n,:,b,c} ],
  H_{n,i,a,c} = ∂J_{n,i,a}/∂t_{n,c}.

Gotchas:

The value path returns the configured missing-cache default when the first-jet cache is absent; gradient/HVP paths need the first and second decoder jets and return zeros when the analytic jet source is unavailable.
The exact Hessian includes a residual-curvature term requiring the third decoder jet. REML/PIRLS curvature should prefer the Gauss-Newton PSD majorizer when a positive curvature block is required.
W_n is a metric weight, not a scalar confidence. Changing it changes the canonical units of latent motion.

The per-row Jacobian J_n is exactly the radial-derivative jet design_gradient_wrt_t already computes for LatentCoordValues; the second derivative ∂J/∂t is built by the shared [crate::basis::radial_basis_cartesian_derivative] engine from the radial Hessian identity. A finite-difference oracle for the docstring is to central-difference value(t ± h e_j) against grad_target(t)[j]; the analytic value follows the oracle until finite-difference cancellation dominates. No autograd needed.

μ = exp(ρ_iso) is REML-selectable as one extra ρ axis.

jacobian_cache_slot and jacobian_second_cache_slot are interior-mutable (RwLock<Option<Arc<…>>>) so the SAE outer loop can refresh them in place each step without needing &mut self on the registry-held penalty (see refresh_caches and [crate::terms::sae::manifold::refresh_isometry_caches_from_atom]). Readers go through the Self::jacobian_cache / Self::jacobian_second_cache accessors, which take the read lock briefly and clone the inner Arc (refcount bump — no payload copy). Writers go through Self::refresh_caches.

Fields§

§target: PsiSlice§reference: IsometryReference§rho_index: usize

Index of this penalty’s strength log μ_iso inside the local rho view this penalty receives. Always 0 for now (single owned axis).

§jacobian_cache_slot: RwLock<Option<Arc<Array2<f64>>>>

Cached Jacobian J ∈ ℝ^{n_obs × p × d}, flattened row-major (n_obs, p*d). The owning driver refreshes this each IFT outer step before invoking value / grad_target; in operator-only call sites (Hessian-vector products) the cache must be live. Access through Self::jacobian_cache / Self::set_jacobian_cache.

§jacobian_second_cache_slot: RwLock<Option<Arc<Array2<f64>>>>

Optional cached per-row Jacobian second derivative H_n ∈ ℝ^{p × d × d}, flattened row-major as (n_obs, p*d*d). H_n[i, a, c] = ∂J_n[i, a] / ∂t_{n, c}. Either this cache or duchon_radial_source must be present for exact isometry gradient/HVP calls. Access through Self::jacobian_second_cache / Self::set_jacobian_second_cache.

§duchon_radial_source: Option<Arc<IsometryDuchonRadialSource>>

Optional radial-Duchon source used to build jacobian_second_cache analytically from φ'(r) and the public φ''(r) jet helper. This is the exact chain-rule path for callers that do not pre-cache ∂J/∂t.

§third_decoder_derivative_slot: RwLock<Option<Arc<Array3<f64>>>>

Optional cached per-row Jacobian third derivative K_n ∈ ℝ^{p × d × d × d}, stored as an Array3 with shape (n_obs, p, d * d * d) where the third axis packs (a, c, d) in row-major order ((a * d) + c) * d + dd. hvp uses the full residual-curvature Hessian (proposal §4(b)): B_{ab,cd} = K_{a,cd}^T W J_b + H_{a,c}^T W H_{b,d} + H_{a,d}^T W H_{b,c} + J_a^T W K_{b,cd}. Either this cache or duchon_radial_source must be present for analytic hvp calls. Interior-mutable (mirrors jacobian_second_cache_slot) so the SAE outer loop can refresh K in place each step. Access through Self::third_decoder_derivative / Self::set_third_decoder_derivative.

§p_out: usize

Output dimensionality p (column count of each per-row Jacobian).

§weight: WeightField

Per-row behavioral metric in low-rank factored form. Defaults to Identity (the unweighted J^T J pullback). When Factored, all g_n contractions are done via M_n = U_n^T J_n (r × d), keeping memory and FLOPs scaling at O(p · r · d) per row instead of O(p²) per row.

§scalar_weight: f64§weight_schedule: Option<ScalarWeightSchedule>

Struct IsometryPenalty Copy item path

Fields§

Implementations§

impl IsometryPenalty

pub const DEFAULT_VALUE_ON_MISSING_CACHE: f64 = 0.0

pub fn new_euclidean(target: PsiSlice, p_out: usize) -> Self

pub fn jacobian_cache(&self) -> Option<Arc<Array2<f64>>>

pub fn jacobian_second_cache(&self) -> Option<Arc<Array2<f64>>>

pub fn refresh_caches( &self, jac: Option<Arc<Array2<f64>>>, jac2: Option<Arc<Array2<f64>>>, )

pub fn set_jacobian_cache(&self, jac: Option<Arc<Array2<f64>>>)

pub fn set_jacobian_second_cache(&self, jac2: Option<Arc<Array2<f64>>>)

pub fn third_decoder_derivative(&self) -> Option<Arc<Array3<f64>>>

pub fn set_third_decoder_derivative(&self, jac3: Option<Arc<Array3<f64>>>)

impl IsometryPenalty

pub fn with_third_decoder_derivative(self, k: Arc<Array3<f64>>) -> Self

pub fn with_reference(self, reference: IsometryReference) -> Self

pub fn with_jacobian_cache(self, j: Arc<Array2<f64>>) -> Self

pub fn with_jacobian_second_cache(self, h: Arc<Array2<f64>>) -> Self

pub fn with_duchon_radial_source( self, source: Arc<IsometryDuchonRadialSource>, ) -> Self

pub fn with_row_metric(self, metric: &RowMetric) -> Self

pub fn with_weight_schedule(self, schedule: ScalarWeightSchedule) -> Self

pub fn pullback_metric(&self, latent_dim: usize) -> Option<Array2<f64>>

pub fn grad_jacobian( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Array2<f64>

Trait Implementations§

impl AnalyticPenalty for IsometryPenalty

fn hvp( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, v: ArrayView1<'_, f64>, ) -> Array1<f64>

fn psd_majorizer_hvp( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, v: ArrayView1<'_, f64>, ) -> Array1<f64>

fn tier(&self) -> PenaltyTier

fn value(&self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>) -> f64

fn grad_target( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Array1<f64>

fn grad_rho( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Array1<f64>

fn rho_count(&self) -> usize

fn name(&self) -> &str

fn apply_schedule(&mut self, iter: usize)

fn hessian_diag( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Option<Array1<f64>>

fn psd_majorizer_diag( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Option<Array1<f64>>

impl Clone for IsometryPenalty

fn clone(&self) -> Self

fn clone_from(&mut self, source: &Self)

impl Debug for IsometryPenalty

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PenaltyManifest for IsometryPenalty

const KIND_TAG: &'static str = "isometry"

const PYTHON_WRAPPER: &'static str = "IsometryPenalty"

const ROW_BLOCK_DIAGONAL: bool = false

fn dispatch_tier(&self) -> PenaltyTier

Auto Trait Implementations§

impl !Freeze for IsometryPenalty

impl RefUnwindSafe for IsometryPenalty

impl Send for IsometryPenalty

impl Sync for IsometryPenalty

impl Unpin for IsometryPenalty

impl UnsafeUnpin for IsometryPenalty

impl UnwindSafe for IsometryPenalty

Blanket Implementations§

impl<T> Allocation for Twhere T: RefUnwindSafe + Send + Sync,

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> ByRef<T> for T

fn by_ref(&self) -> &T

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DTwhere ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DTwhere ST: ?Sized, DT: ?Sized,

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> DistributionExt for Twhere T: ?Sized,

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> Twhere Self: Distribution<T>,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Imply<T> for Uwhere T: ?Sized, U: ?Sized,

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

Struct IsometryPenalty

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> DistributionExt for T
where T: ?Sized,

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,