Struct EncodeAtlas

Source

pub struct EncodeAtlas {
    pub atoms: Vec<AtomEncodeAtlas>,
    pub config: AtlasConfig,
}

Expand description

The encode atlas: per-atom certified charts plus the online certified-encode driver (issue #1010).

Fields§

§atoms: Vec<AtomEncodeAtlas>§config: AtlasConfig

Implementations§

Source §

impl EncodeAtlas

Source

pub fn build( atoms: &[SaeManifoldAtom], amplitude_bound: &[f64], target_norm_bound: f64, config: AtlasConfig, ) -> Result<Self, String>

Build the offline atlas over a frozen dictionary: for each atom, lay down chart centers on the atom’s coordinate grid and certify a Newton radius from the Kantorovich inequality at the worst-case in-chart start.

amplitude_bound[k] is the per-atom bound on |z_k| used to scale the reconstruction jets (the offline L must hold for the largest amplitude the encode can produce); target_norm_bound bounds ‖x‖ over the data.

Source

pub fn certified_encode_row( &self, atom: &SaeManifoldAtom, atom_index: usize, x: ArrayView1<'_, f64>, amplitude: f64, ) -> Result<(Array1<f64>, RowCertificate), String>

Online certified encode of one target row x against one atom k with fixed amplitude z. Routes to the nearest chart, starts from that chart’s distilled IFT warm start, runs config.newton_steps Newton steps, and returns the encoded coordinate with its certificate. An uncertified start (no chart, no distilled Jacobian, non-positive amplitude, or h > ½) flags the row for the exact multi-start caller.

Source

pub fn amortized_encode_row( &self, atom: &SaeManifoldAtom, atom_index: usize, x: ArrayView1<'_, f64>, amplitude: f64, ) -> Result<(Array1<f64>, RowCertificate), String>

Amortized (distilled) encode of one target row x against one atom k with fixed amplitude z (#1026 ladder item 3).

Routes to the nearest chart, then predicts the latent coordinate in CLOSED FORM from that chart’s precomputed implicit-function-theorem Jacobian:

t̂ = t_c + (1/z) · A₁ · (x − z · m₁(t_c)),

a single O(d·p) mat-vec — no per-row Hessian factorization or eigendecomposition, which is the amortization. The Kantorovich certificate is then evaluated AT the predicted start t̂ with the chart’s closed-form Lipschitz constant. A prediction is accepted only when that certificate holds, an independent cold chart-center probe also certifies, and the two refined coordinates agree within the two probes’ final Kantorovich root-radius bounds. This keeps the distilled path honest without letting the exact probe reuse the distilled warm start it is auditing. A chart without a distilled Jacobian (singular Gauss–Newton block) flags the row.

Source

pub fn amortized_encode_batch( &self, atom: &SaeManifoldAtom, atom_index: usize, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, ) -> Result<EncodeResult, String>

Batched amortized (distilled) encode over many rows against one atom (#1026 ladder item 3, corpus-rate). Each row uses the closed-form per-chart Jacobian predictor and carries its own Kantorovich certificate; uncertified rows are flagged in EncodeResult::encode_uncertified_count for the exact multi-start fallback. Row-independent against the frozen dictionary, so the batch fans out over rows (deterministic row-order assembly, bit-identical run-to-run), staying sequential inside a rayon worker to avoid nested oversubscription.

Source

pub fn certified_encode_batch( &self, atom: &SaeManifoldAtom, atom_index: usize, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, ) -> Result<EncodeResult, String>

Batched certified encode over many rows against one atom (the #988 throughput consumer). Each row carries its own certificate; uncertified rows are flagged in EncodeResult::encode_uncertified_count for the exact multi-start fallback.

Source

pub fn amortized_encode_batch_fast( &self, atom: &SaeManifoldAtom, atom_index: usize, x: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, ) -> Result<(Array2<f64>, Vec<bool>), String>

Batched GEMM “fast” amortized encode — the traditional-encoder forward pass, WITH manifolds. For every row this applies the SAME closed-form affine predictor as [amortized_warm_start] (t̂ = t_c + (1/z)·A₁·(x − z·m₁)), but routed and applied as batched matrix products instead of a per-row loop wrapped in the Kantorovich certificate + basin warmup. NO per-row certificate is taken: this is the speed mode (the certified *_encode_* paths remain the accuracy mode).

Cost is GEMM-bound: one (n × p)·(p × d) decode-distance product for nearest-chart routing (skipped for single-chart atoms) plus, per chart, one (n_c × p)·(p × d) predictor product — i.e. ≈ X·Wᵀ, exactly a dense SAE encoder’s forward map.

Degenerate rows are handled exactly as amortized_warm_start flags them (returns None ⇒ zeroed coord here): a missing basis evaluator, a chart whose Gauss–Newton block was singular (amortized_jacobian == None), or a non-finite / non-positive amplitude. Those rows are zeroed (never a panic, never a silent wrong encode), and their indices are returned in the valid mask so the caller can route them to the exact path if desired.

Returns (coords, valid) where coords is n × d and valid[row] is true iff the amortized predictor fired for that row.

Source

pub fn amortized_reconstruct_batch_fast( &self, atom: &SaeManifoldAtom, atom_index: usize, x: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, ) -> Result<(Array2<f64>, Vec<bool>), String>

Fast batched FULL forward pass against one atom: encode → decode, the manifold analogue of a traditional SAE’s x̂ = z·D (decoder D, code z).

A traditional SAE decodes with one GEMM. The manifold SAE’s reconstruction is m(t̂) = z·Φ(t̂)·B (module header) — the SAME GEMM Φ·B, but the code Φ(t̂) is the curved chart basis evaluated at the encoded latent coordinate rather than a flat one-hot. So the fast forward is exactly:

[amortized_encode_batch_fast] → per-row latent coords t̂ (one routing GEMM + one affine GEMM per chart — a traditional W·x+b);
ONE batched basis evaluation Φ(t̂) (the manifold-curvature step a flat SAE doesn’t have — n×m);
ONE GEMM recon = Φ(t̂)·B ((n×m)·(m×p) — a traditional decoder z·D), then the per-row amplitude scale z.

Rows the encoder could not certify-predict (no evaluator / singular Gauss–Newton block / non-finite-or-zero amplitude) are returned as a ZERO reconstruction and flagged false in the valid-mask — never a silent wrong decode. The reconstruction of a valid row equals, bit-for-bit up to GEMM reassociation, z·(Φ(t̂_row)·B) with t̂ from the per-row predictor.

Source

pub fn certified_encode_with_index<S: AtomFrameSketch + Sync>( &self, atoms: &[SaeManifoldAtom], index: &SaeCandidateIndex, sketch: &S, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, latent_dim: usize, ) -> Result<EncodeResult, String>

LSH-routed certified encode (issue #1010 step 2 + 3): for each target row, the existing SaeCandidateIndex (#985/#994) proposes the best-aligned atom by frame alignment to the row direction; the row is then encoded against THAT atom’s certified chart atlas. This is the production routing path — the LSH does sublinear atom selection, the atlas does the in-atom nearest-chart routing and the per-row Kantorovich certificate.

atoms[id] must be aligned with the atlas’s atoms[id] (same dictionary order the atlas was built from and the sketch/index were built over). A row with no LSH proposal (empty bucket) is flagged uncertified — it routes to the exact multi-start fallback, never a silent wrong encode.

Source

pub fn amortized_encode_with_index<S: AtomFrameSketch + Sync>( &self, atoms: &[SaeManifoldAtom], index: &SaeCandidateIndex, sketch: &S, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, latent_dim: usize, ) -> Result<EncodeResult, String>

LSH-routed AMORTIZED (distilled) encode — the production token-rate encoder of #1026 ladder item 3. Identical routing to Self::certified_encode_with_index (LSH proposes the best-aligned atom, the atlas routes to the in-atom nearest chart), but the in-atom encode is the closed-form per-chart Jacobian predictor + certificate gate of Self::amortized_encode_row rather than the certified Newton-refinement path. This is the deployment path: the distilled affine map produces the encode in one mat-vec, the Kantorovich certificate decides trust-or-fallback per row, and uncertified rows (the adversarial tail the thread expects to concentrate on rare tokens) are flagged for the exact multi-start solve — compute goes where the questions are. Row-independent against the frozen dictionary, so the batch fans out over rows with deterministic row-order assembly (bit-identical run-to-run).

Source

pub fn amortized_encode_with_index_fast<S: AtomFrameSketch + Sync>( &self, atoms: &[SaeManifoldAtom], index: &SaeCandidateIndex, sketch: &S, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, latent_dim: usize, ) -> Result<(Array2<f64>, Vec<bool>), String>

LSH-routed FAST amortized encode over the WHOLE dictionary — the multi-atom, corpus-rate analogue of Self::amortized_encode_with_index.

amortized_encode_with_index routes per row, then runs the per-row closed-form predictor + Kantorovich certificate + cold cross-check on each row independently. This fast variant keeps the SAME sublinear per-row LSH routing (cheap — index.propose + the alignment gate), but replaces the per-row predictor with the GEMM-batched Self::amortized_encode_batch_fast: it GROUPS rows by their proposed atom and runs one batched affine-predictor pass per atom-group (a routing GEMM + a predictor GEMM each), reproducing a traditional SAE’s whole-dictionary W·x+b throughput. No per-row certificate — this is the speed mode validated as accuracy-parity with the certified solve (fast_forward_is_accuracy_parity_with_certified).

Returns the per-row latent coords and a valid-mask: false for a row with no LSH proposal, a sub-threshold/NaN routing alignment, or one the batched predictor could not fire on (no evaluator / singular Gauss–Newton block / non-finite-or-zero amplitude). Each row is written exactly once (disjoint per-atom groups), so the result is independent of group iteration order.

Source

pub fn amortized_reconstruct_with_index_fast<S: AtomFrameSketch + Sync>( &self, atoms: &[SaeManifoldAtom], index: &SaeCandidateIndex, sketch: &S, targets: ArrayView2<'_, f64>, amplitudes: ArrayView1<'_, f64>, ) -> Result<(Array2<f64>, Vec<bool>), String>

LSH-routed FAST full forward over the WHOLE dictionary: encode → decode, the multi-atom analogue of Self::amortized_reconstruct_batch_fast. Same sublinear per-row routing + per-atom grouping as Self::amortized_encode_with_index_fast, but each group is run through the batched reconstruct (m(t̂) = z·Φ(t̂)·B) so the result is the per-row reconstruction in the ambient space. Rows that do not route/predict decode to an exact zero reconstruction and are flagged false.

Trait Implementations§

Source §

impl Clone for EncodeAtlas

Source §

fn clone(&self) -> EncodeAtlas

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for EncodeAtlas

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl UnwindSafe for EncodeAtlas

Blanket Implementations§

Source §

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> ByRef<T> for T

Source §

fn by_ref(&self) -> &T

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> DistributionExt for T
where T: ?Sized,

Source §

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §