pub struct SaeReconstructionRowProgram {
pub atoms: Vec<AtomRowBasisJet>,
pub gate_value: Vec<f64>,
pub logits: Vec<f64>,
pub gate_scale: Vec<f64>,
pub gate_shift: Vec<f64>,
pub gate: RowGate,
pub logit_slot: Vec<Option<usize>>,
pub coord_slot: Vec<Vec<usize>>,
pub n_primaries: usize,
}Expand description
One row of the SAE reconstruction as a jet program: the per-atom basis jets,
the gate, the current gate-logit values, and the primary layout that maps
(atom logit, atom latent axis) to a seeded tower variable slot.
Fields§
§atoms: Vec<AtomRowBasisJet>Per-atom basis jets at the current row.
gate_value: Vec<f64>Current gate activations ζ_k at the row (softmax/sigmoid values).
logits: Vec<f64>Current gate logits ℓ_k at the row.
gate_scale: Vec<f64>Per-atom multiplicative scale for independent logistic gates. This is
the IBP stick-breaking prior π_k for IBP-MAP, 1 for active JumpReLU,
and 0 for JumpReLU rows at/below the hard threshold. Unused for
softmax.
gate_shift: Vec<f64>Per-atom logistic shift (IBP offset / JumpReLU threshold); unused for softmax.
gate: RowGateThe gate nonlinearity.
logit_slot: Vec<Option<usize>>Tower slot of atom k’s gate logit primary, or None if the gate logit
is not a free primary for this atom (softmax K==1).
coord_slot: Vec<Vec<usize>>Tower slot of atom k’s latent axis j primary (coord_slot[k][j]).
n_primaries: usizeTotal number of seeded primaries (= K of the tower).
Implementations§
Source§impl SaeReconstructionRowProgram
impl SaeReconstructionRowProgram
Sourcepub fn reconstruction_column_packed<const K: usize>(
&self,
out_col: usize,
) -> Order2<K>
pub fn reconstruction_column_packed<const K: usize>( &self, out_col: usize, ) -> Order2<K>
The reconstruction output column c as the PACKED order-2 jet
Order2<K>: value .value(),
gradient .g()[a] = ∂ẑ_c/∂p_a, Hessian .h()[a][b] = ∂²ẑ_c/∂p_a∂p_b.
This is the production path (#932): the arrow-Schur logdet consumer reads
ONLY the order-≤2 channels of the reconstruction, so it builds the packed
Order2<K> scalar — value/gradient/Hessian only — instead of the dense
Tower4<K> (which materialises the entire K⁴ t3/t4 tensor every row
only to discard it). For K up to 16 the dense tower’s tensor build is
~19× the instruction count of the order-2 channels alone; this collapses
it to the channels actually read. The packed (v, g, H) is BIT-IDENTICAL
to the order-≤2 channels of [Self::reconstruction_column_tower] (the
Order2 newtype delegates to the same Tower2 arithmetic the dense
tower’s order-≤2 channels use); the t3/t4 oracle pins the dense path.
Sourcepub fn reconstruction_all_columns_packed<const K: usize>(
&self,
) -> Vec<Order2<K>>
pub fn reconstruction_all_columns_packed<const K: usize>( &self, ) -> Vec<Order2<K>>
All out_dim reconstruction columns as packed Order2<K> jets, with
the per-row redundant sub-jets HOISTED out of the output-column loop
(#932 perf). reconstruction_column_packed(c) rebuilds, for every output
column c, both the per-atom softmax gate jet ζ_k (K exps + a recip
- a
K×KHessian — the dominant cost) AND each per-atom basis jetΦ_{k,b}— yet neither depends onc: the gate is a function of the logits only, and the basis jet is the local Taylor model ofΦ_bin the coords, the decoder coefficientB_{b,c}being the onlyc-dependent factor. The consumer (fill_reconstruction_channels_from_program) calls it once perc, so the gate and basis jets are recomputedout_dim×redundantly.
This builds each atom’s gate jet ONCE (K total) and each atom’s basis
jets ONCE (n_basis per atom), then assembles every column by the cheap
reductions decoded_{k,c} = Σ_b Φ_{k,b}·B_{b,c} and
ẑ_c = Σ_k ζ_k·decoded_{k,c}. The result is bit-identical to calling
Self::reconstruction_column_packed per column (same Leibniz products in
the same order) — only the redundant recomputation is removed — measured
~9× faster at K=8, out_dim=16 on the per-row hot path.
Sourcepub fn reconstruction_column<const K: usize>(&self, out_col: usize) -> Tower4<K>
pub fn reconstruction_column<const K: usize>(&self, out_col: usize) -> Tower4<K>
The reconstruction output column as the full dense Tower4<K> carrying
every value/gradient/Hessian/t3/t4 channel. This is the #932 oracle
ground truth: the production Self::reconstruction_column_packed
order-2 path is pinned against its order-≤2 channels, and the FD-witness
tests use its t3/t4. Not on the per-row hot path.
Sourcepub fn beta_border_tower_packed<const K: usize>(
&self,
atom: usize,
basis_col: usize,
) -> Order2<K>
pub fn beta_border_tower_packed<const K: usize>( &self, atom: usize, basis_col: usize, ) -> Order2<K>
The β border-channel local-variable sub-jet as the PACKED order-2 jet
Order2<K>. The consumer reads only
.value() (the beta channel) and .g()[a] (the beta_deriv /
beta_l_deriv mixed channel — the reconstruction is linear in β so the
Hessian-in-β vanishes and only value+gradient are needed). Built from the
SAME packed gate / basis primitives as Self::reconstruction_column, so
the dense t3/t4 tensor is never materialised on this per-row hot path
(#932 Tower4→Order2 cutover).
Sourcepub fn beta_border_tower<const K: usize>(
&self,
atom: usize,
basis_col: usize,
) -> Tower4<K>
pub fn beta_border_tower<const K: usize>( &self, atom: usize, basis_col: usize, ) -> Tower4<K>
The β border-channel sub-jet as the full dense Tower4<K> — the #932
oracle ground truth the packed Self::beta_border_tower_packed is
pinned against. Not on the per-row hot path.
Sourcepub fn beta_border_towers_packed<const K: usize>(
&self,
channels: &[(usize, usize)],
) -> Vec<Order2<K>>
pub fn beta_border_towers_packed<const K: usize>( &self, channels: &[(usize, usize)], ) -> Vec<Order2<K>>
Packed β border-channel sub-jets for a batch of (atom, basis_col)
channels, with the per-atom gate jets HOISTED and the softmax denominator
SHARED across atoms (#932 perf): the gate jet ζ_k (the dominant K-exp
/ K×K-Hessian cost) is a function of the row’s logits only, not of
basis_col, and every atom’s gate shares one softmax denominator /
reciprocal. Self::all_gates builds all K gates once (K exps + 1
recip per row); each channel then just multiplies its atom’s cached gate
by its basis jet. Each result is bit-identical to
Self::beta_border_tower_packed for the same (atom, basis_col) (same
gate.mul(basis) product), in the input order.
Sourcepub fn beta_border_order1_packed<const K: usize>(
&self,
channels: &[(usize, usize)],
) -> Vec<Order1<K>>
pub fn beta_border_order1_packed<const K: usize>( &self, channels: &[(usize, usize)], ) -> Vec<Order1<K>>
Packed β border-channel sub-jets for a batch of channels as the
FIRST-order jet Order1<K> — value +
gradient ONLY, no Hessian. The β-border consumer
(fill_beta_border_channels_from_program) reads exactly .value() (the
beta channel) and .g()[a] (the mixed beta_deriv / beta_l_deriv
channel); the reconstruction is linear in β so the Hessian-in-β vanishes
and the K×K Hessian that Self::beta_border_towers_packed’s Order2
builds is computed-and-discarded every call. This method drops that work:
Order1’s value/gradient are BIT-IDENTICAL to Order2’s (the order-≤1
channels never read a Hessian), proven by the order1_* oracle, while the
per-channel gate.mul(basis) skips the K² Hessian product.
Same hoisting as Self::beta_border_towers_packed: gate jets built once
via Self::all_gates, each channel multiplies its atom’s gate by its
basis jet.
Source§impl SaeReconstructionRowProgram
Structural layout signature of a row program: the part that MUST be identical
across rows for them to share one SIMD op graph (slot mapping, per-atom
basis/latent/decoder shape, primary count). The per-row numeric data
(phi/d_phi/d2_phi/decoder VALUES, logits) is what differs between
lanes; the layout is what is shared.
impl SaeReconstructionRowProgram
Structural layout signature of a row program: the part that MUST be identical
across rows for them to share one SIMD op graph (slot mapping, per-atom
basis/latent/decoder shape, primary count). The per-row numeric data
(phi/d_phi/d2_phi/decoder VALUES, logits) is what differs between
lanes; the layout is what is shared.
Sourcepub fn reconstruction_all_columns_batch4<const K: usize>(
rows: [&Self; 4],
) -> Option<[Vec<Order2<K>>; 4]>
pub fn reconstruction_all_columns_batch4<const K: usize>( rows: [&Self; 4], ) -> Option<[Vec<Order2<K>>; 4]>
All out_dim reconstruction columns for FOUR softmax-aligned rows at once,
returned per row. Each row’s column vector is BIT-IDENTICAL to
Self::reconstruction_all_columns_packed on that row (same hoisting,
same Leibniz products in the same order — lane i mirrors the scalar
row-i path). Returns None if the four rows are not softmax-aligned, so
the caller can fall back to the scalar per-row path.
Sourcepub fn beta_border_order1_batch4<const K: usize>(
rows: [&Self; 4],
channels: &[(usize, usize)],
) -> Option<[Vec<Order1<K>>; 4]>
pub fn beta_border_order1_batch4<const K: usize>( rows: [&Self; 4], channels: &[(usize, usize)], ) -> Option<[Vec<Order1<K>>; 4]>
Packed β-border FIRST-order jets for a batch of (atom, basis_col)
channels, for FOUR softmax-aligned rows at once, returned per row. Each
row’s channel vector is BIT-IDENTICAL to
Self::beta_border_order1_packed on that row. Returns None if the
rows are not softmax-aligned.
Trait Implementations§
Source§impl Clone for SaeReconstructionRowProgram
impl Clone for SaeReconstructionRowProgram
Source§fn clone(&self) -> SaeReconstructionRowProgram
fn clone(&self) -> SaeReconstructionRowProgram
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for SaeReconstructionRowProgram
impl RefUnwindSafe for SaeReconstructionRowProgram
impl Send for SaeReconstructionRowProgram
impl Sync for SaeReconstructionRowProgram
impl Unpin for SaeReconstructionRowProgram
impl UnsafeUnpin for SaeReconstructionRowProgram
impl UnwindSafe for SaeReconstructionRowProgram
Blanket Implementations§
impl<T> Allocation for T
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.