Skip to main content

NestedPrefixPenalty

Struct NestedPrefixPenalty 

Source
pub struct NestedPrefixPenalty {
    pub target: PsiSlice,
    pub target_tier: PenaltyTier,
    pub prefix_sizes: Vec<usize>,
    pub shell_weights: Vec<f64>,
    pub eps: f64,
    pub rho_indices: Vec<usize>,
    pub weight_schedule: Option<ScalarWeightSchedule>,
}
Expand description

Nested-prefix sparsity penalty used by the Matryoshka SAE (Bussmann/Nabeshima/Karvonen/Nanda, ICML 2025, arXiv:2503.17547).

Given K nested prefix sizes m_1 < m_2 < ... < m_K ≤ F over the latent dimension F, and per-shell weights λ_k = w_k · exp(ρ_k), the penalty is

  P(t; ρ) = Σ_k λ_k · Σ_{i=0}^{m_k - 1} sqrt(t_i² + ε²)

summed over all rows of the latent target. Equivalently, coordinate i contributes with effective weight W_i = Σ_{k: m_k > i} λ_k, so the earliest atoms (small i) are penalized by every shell (= strongest L¹) and the latest atoms only by the outermost shell. This is exactly the mask-weighted sum-of-L¹ over K prefixes used to enforce shell-wise reconstruction during Matryoshka training.

Closed forms (per row, summed across all rows):

  ∂P/∂t_i      = W_i · t_i / sqrt(t_i² + ε²)
  Hess_diag(i) = W_i · ε² / (t_i² + ε²)^{3/2}           (PSD)
  ∂P/∂ρ_k      = λ_k · Σ_{i < m_k} sqrt(t_i² + ε²)

target lays out n_rows × latent_dim in row-major order (row * F + col). latent_dim is taken from PsiSlice::latent_dim; if absent we fall back to the maximum prefix size, which is the standard Matryoshka convention.

Fields§

§target: PsiSlice§target_tier: PenaltyTier§prefix_sizes: Vec<usize>

Sorted strictly-increasing prefix sizes m_1 < m_2 < ... < m_K.

§shell_weights: Vec<f64>

Per-shell base weights w_k. The effective strength is λ_k = w_k · exp(ρ_k).

§eps: f64

Smoothing parameter ε > 0 for the smoothed-L¹ surrogate sqrt(x² + ε²); the Hessian needs ε > 0 for differentiability at 0.

§rho_indices: Vec<usize>

Local ρ indices for the K per-shell log-strengths.

§weight_schedule: Option<ScalarWeightSchedule>

Implementations§

Source§

impl NestedPrefixPenalty

Source

pub fn new( target: PsiSlice, target_tier: PenaltyTier, prefix_sizes: Vec<usize>, shell_weights: Vec<f64>, eps: f64, ) -> Result<Self, String>

Build a new nested-prefix penalty.

Errors when:

  • prefix_sizes is empty.
  • prefix_sizes is not strictly increasing.
  • any prefix exceeds the latent dimension (when known).
  • shell_weights.len() != prefix_sizes.len().
  • eps <= 0 (the smoothed-L¹ gradient 1/sqrt(x²+ε²) and Hessian ε²/(x²+ε²)^{3/2} both need ε > 0).
Source

pub fn with_weight_schedule(self, schedule: ScalarWeightSchedule) -> Self

Attach a global annealing schedule shared by all shell weights. The REML loop still picks per-shell ρ_k on top of this baseline.

Trait Implementations§

Source§

impl AnalyticPenalty for NestedPrefixPenalty

Source§

fn tier(&self) -> PenaltyTier

Tier the target lives in (β or ext-coord).
Source§

fn value(&self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>) -> f64

Scalar penalty contribution P(target; ρ). The strength factor exp(ρ) (or whatever parameterization the penalty uses) is folded in.
Source§

fn grad_target( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Array1<f64>

Gradient ∂P/∂target, same length as target.
Source§

fn hessian_diag( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Option<Array1<f64>>

Diagonal of the Hessian diag(∂²P/∂target²) when the Hessian is block-diagonal. Returns None for penalties whose Hessian is dense (Isometry); those implement Self::hvp instead. The default signals “no closed-form diagonal” by returning None for any non-empty target — concrete penalties either override with their own analytic diagonal or rely on the matrix-free hvp path.
Source§

fn grad_rho( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Array1<f64>

Gradient of the penalty value w.r.t. each owned ρ-axis. Length equals Self::rho_count.
Source§

fn rho_count(&self) -> usize

Number of REML-selectable hyperparameter axes this penalty contributes to the outer ρ vector.
Source§

fn name(&self) -> &str

Human-readable identifier for diagnostics / logging.
Source§

fn apply_schedule(&mut self, iter: usize)

Update any attached scalar weight schedule at the given REML outer iteration. Penalties without schedules keep their stored weight.
Source§

fn hvp( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, v: ArrayView1<'_, f64>, ) -> Array1<f64>

Hessian-vector product H v = (∂²P/∂target²) v, in closed form. Read more
Source§

fn psd_majorizer_diag( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, ) -> Option<Array1<f64>>

Diagonal of a PSD majorizer of the Hessian — the positive re-weighted-ℓ₂ / MM surrogate diag(B(target; ρ)) with B ⪰ ∂²P/∂target² everywhere and B ⪰ 0. This is a different operator from Self::hessian_diag: for nonconvex penalties (log sparsity, JumpReLU) the exact Hessian is indefinite, but the inner Newton / PIRLS solve and the log-det / preconditioner pipeline require a PSD curvature block. For convex penalties the majorizer coincides with the exact Hessian, so the default simply delegates to Self::hessian_diag; nonconvex penalties override.
Source§

fn psd_majorizer_hvp( &self, target: ArrayView1<'_, f64>, rho: ArrayView1<'_, f64>, v: ArrayView1<'_, f64>, ) -> Array1<f64>

Matrix-vector product against the PSD majorizer B(target; ρ) v (see Self::psd_majorizer_diag). For convex penalties this is the exact Hessian-vector product, so the default delegates to Self::hvp; nonconvex penalties override to return their PSD surrogate instead of the indefinite true Hessian.
Source§

impl Clone for NestedPrefixPenalty

Source§

fn clone(&self) -> NestedPrefixPenalty

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for NestedPrefixPenalty

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PenaltyManifest for NestedPrefixPenalty

Source§

const KIND_TAG: &'static str = "nested_prefix"

Source§

const PYTHON_WRAPPER: &'static str = "NestedPrefixPenalty"

Source§

const ROW_BLOCK_DIAGONAL: bool = true

Source§

fn dispatch_tier(&self) -> PenaltyTier

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V