1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
use *;
pub const SAE_MANIFOLD_ARMIJO_C1: f64 = 1.0e-4;
pub const SAE_MANIFOLD_MAX_LINESEARCH_HALVINGS: usize = 12;
/// Relative Cholesky-pivot floor for the analytic SAE outer-rho gradient.
///
/// The evidence value can still be honest below this threshold because it only
/// sums `log(diag(L))`. The analytic gradient is different: selected-inverse
/// traces and `ArrowFactorCache::full_inverse_apply` divide by those pivots.
/// Once `min_pivot / max_pivot` is below this floor, the gradient lane must
/// either identify a closed-form gauge orbit and stiffen only that quotient
/// direction, or reject the trial rho as numerically singular.
pub const SAE_OUTER_GRADIENT_PIVOT_RATIO_FLOOR: f64 = 1.0e-12;
pub const SAE_OUTER_GRADIENT_GAUGE_RAYLEIGH_FACTOR: f64 = 1.0e-8;
/// Relative spectral cutoff below which a penalised decoder β-curvature
/// eigenvalue (`G_k + λ_smooth·S_k`) is treated as a genuine flat direction of
/// the joint inner Hessian — the rank-deficient-decoder null quotiented out of
/// the inner convergence measure and deflated in the outer gradient (#1051).
/// Matches the `1e-9` relative rank cutoff used across the codebase.
pub const SAE_DECODER_BETA_NULL_RELATIVE_FLOOR: f64 = 1.0e-9;
/// Largest decoder (`β`) block dimension for which the outer-gradient
/// conditioning path may additionally probe the β coordinate basis for a
/// near-null subspace of the joint Hessian (issue #1051, #1095).
///
/// The closed-form gauge orbit ([`SaeManifoldTerm::dense_step_gauge_vectors`])
/// only covers the *chart* reparametrisation freedom (constant + linear
/// coordinate fields). It does NOT cover a **rank-deficient decoder design** —
/// e.g. a euclidean-1D atom fit to a straight line in a `p = 2` ambient leaves
/// the decoder column space rank-1, so one decoder direction is unidentified by
/// the data and the joint Hessian acquires a near-null direction that lives in
/// the β block, not the gauge orbit. That direction is exactly a Faddeev-Popov
/// gauge of the *same* kind (a flat direction of the evidence quotient), so it
/// is deflated identically — but only after the β basis is admitted as a
/// deflation candidate. The dense `k×k` Rayleigh eigendecomposition that
/// resolves it is `O(k³)`, so it is gated to moderate β blocks; large-`p`
/// LLM-scale fits keep the pure gauge-orbit path untouched (they reach low
/// decoder rank through the Grassmann frame, which reduces the border width
/// from `M·p` to `M·r` where `r ≪ p`, so `k ≤ M·r` is always small). PCA-
/// reduced fits (p ≈ 32–128) with the Grassmann frame active can have
/// `k = M·r` up to ~512 (e.g. m=8 basis fns, p=32, r=8 → k=64, but for
/// m=16 → k=128, m=32 → k=256); 512 covers all typical small-atom PCA cases
/// while keeping the O(k³) cost ≈ 0.13B ops — negligible next to the solve.
pub const SAE_OUTER_GRADIENT_BETA_NULL_PROBE_MAX_DIM: usize = 512;
/// Nominal curvature-homotopy `η` step (#1007): the tracker covers `η ∈ [0, 1]`
/// in this many equal predictor-corrector waypoints when the branch is clean.
/// Five waypoints is a few corrector solves — far cheaper than the multi-seed
/// cascade it replaces — and the step is halved adaptively when the arrow-factor
/// min pivot shrinks, so a near-bifurcation stretch is resolved at finer
/// granularity without a separate knob.
pub const CURVATURE_WALK_INITIAL_ETA_STEP: f64 = 0.2;
/// Smallest curvature-homotopy `η` step (#1007). A pivot collapse (or corrector
/// failure) that persists at this step is a DETECTED branch bifurcation, not a
/// step-size artifact: the walk records it and defers to the seed cascade.
pub const CURVATURE_WALK_MIN_ETA_STEP: f64 = 1.0 / 256.0;
/// Hard ceiling on accepted corrector solves in one curvature-homotopy walk
/// (#1007). Bounds the walk's cost under repeated halving; reaching it is a
/// structural-termination signal (the branch is not cleanly trackable) that
/// defers to the cascade, never a spin.
pub const CURVATURE_WALK_MAX_CORRECTORS: usize = 32;
/// Minimum η = 1 reconstruction explained-variance for the curvature-homotopy
/// walk to certify "arrived" (#1117). The predictor-corrector walk from the
/// Eckart-Young LINEAR anchor can converge (legitimately, on the gauge/decoder-
/// null quotient) into a degenerate basin whose reconstruction is worse than the
/// data mean (a NEGATIVE EV). When the post-polish EV is below this floor the
/// walk runs a bounded joint Newton recovery from the pristine seed, and on
/// failure demotes to a recorded bifurcation so the documented seed cascade
/// recovers the good branch. The floor sits well below a genuine recovery (a
/// real circle reconstructs at EV ≳ 0.9) but firmly above the worse-than-trivial
/// basins, so a clean arrival is never touched while a garbage basin always is.
pub const CURVATURE_WALK_ARRIVAL_EV_FLOOR: f64 = 0.5;
/// Joint Newton iteration budget for the curvature-walk degenerate-basin
/// recovery (#1117). When the walk lands on a sub-`CURVATURE_WALK_ARRIVAL_EV_
/// FLOOR` reconstruction, the recovery runs a REAL joint fit from the pristine
/// (circle-aware) seed with at least this many inner iterations, independent of
/// the outer objective's possibly-frozen `inner_max_iter = 0` — otherwise the
/// recovery (and the fallback cascade) would re-freeze at the cold seed and
/// never escape the linear basin. Matches the cold-start budget that recovers
/// EV ≈ 0.94 on the K = 1 periodic circle.
pub const CURVATURE_WALK_RECOVERY_INNER_ITERS: usize = 25;
/// Relative floor on the Newton directional decrease, expressed as a tiny
/// multiple of `‖g‖·‖Δ‖`. A predicted decrease below this is at the level of
/// f64 round-off in the quadratic model and is treated as no progress (the step
/// is rejected). Scaling by the gradient/step norms makes the floor invariant
/// to the problem's overall magnitude.
pub const SAE_MANIFOLD_DIRECTIONAL_DECREASE_REL_FLOOR: f64 = 1.0e-14;
/// Row count at or above which the fused SAE reconstruction data-fit
/// (`loss_scaled`) fans its per-row decode + residual reduction out over
/// rayon. Below this the single-threaded fused pass is cheaper than the
/// fan-out; matched in spirit to the arrow-Schur `SCHUR_MATVEC_PARALLEL_ROW_MIN`
/// gate so short batches inside an outer fan-out stay sequential (#1017).
pub const SAE_LOSS_PARALLEL_ROW_MIN: usize = 64;
/// Relative tolerance on the undamped Newton step norm (scaled by the iterate
/// scale) for accepting inner-solve convergence.
pub const SAE_MANIFOLD_INNER_STEP_REL_TOL: f64 = 1.0e-4;
/// Relative tolerance on the KKT gradient norm (scaled by the iterate scale) for
/// accepting inner-solve convergence.
pub const SAE_MANIFOLD_INNER_GRAD_REL_TOL: f64 = 1.0e-5;
/// Relative per-refine-round penalised-objective decrease below which the inner
/// solve is treated as having reached its numerical fixed point (#1051). On an
/// ill-conditioned penalised bilinear fit the KKT gradient and undamped step
/// stay above tolerance while the objective stops moving; this `√εmach`-scale
/// floor recognises that stalled iterate as the converged inner optimum instead
/// of grinding the refine budget to the `1e12` infeasible sentinel.
pub const SAE_MANIFOLD_INNER_OBJECTIVE_STALL_REL_TOL: f64 = 1.0e-8;
/// Fraction of the total since-entry objective reduction below which a refine
/// round's contribution is treated as cosmetic flat-valley crawl (#1051), so the
/// inner solve is accepted as numerically converged. At `1e-4` the inner fit has
/// captured ≥ 99.99% of the achievable penalised-objective reduction before the
/// criterion is ranked — far past the point where further crawl can change the
/// Laplace evidence, yet strict enough that a materially-improving fit refines on.
pub const SAE_MANIFOLD_INNER_OBJECTIVE_STALL_FRACTION: f64 = 1.0e-4;
/// Minimum completed refine rounds before the objective-stagnation fixed point
/// may be accepted (#1051). Enough rounds to establish a meaningful
/// total-improvement baseline for the fraction test, but far below the full
/// refine budget — terminating the ill-conditioned crawl early is the goal.
pub const SAE_MANIFOLD_INNER_OBJECTIVE_STALL_MIN_ROUNDS: usize = 3;
/// Above this full-`B` β width, dense beta-penalty curvature is never
/// materialized when Grassmann frames are engaged; exact curvature is probed
/// directly in the factored coordinate space instead.
pub const SAE_DENSE_BETA_PENALTY_PROBE_MAX_DIM: usize = 4096;
/// Relative spectral cutoff for counting the numerical rank / nullity of a
/// symmetric penalty Gram: eigenvalues at or below `cutoff · λ_max` are treated
/// as zero. Shared by [`SaeManifoldTerm::symmetric_rank`] and
/// [`smooth_penalty_nullity`] so the two stay in lockstep.
pub const SAE_MANIFOLD_SPECTRAL_RANK_CUTOFF: f64 = 1.0e-9;
/// Floor on the Levenberg-Marquardt ridge added to a per-row Hessian before
/// Cholesky, so the first attempt is always strictly positive even when the
/// caller passes a zero base ridge.
pub const SAE_MANIFOLD_ROW_RIDGE_FLOOR: f64 = 1.0e-12;
/// Multiplicative factor by which the LM ridge is escalated after a failed
/// Cholesky factorisation of a per-row Hessian.
pub const SAE_MANIFOLD_ROW_RIDGE_GROWTH: f64 = 10.0;
pub
/// Final fitted-data explained-variance floor for the reconstruction-collapse
/// guard (#1023). This is deliberately an effectively-zero threshold: ordinary
/// under-fitting is a model-quality issue, but returning a K>=1 active SAE whose
/// fitted matrix is indistinguishable from the column mean is a structural
/// collapse and must enter the #976 CollapseEvent ledger.
pub const SAE_FIT_DATA_COLLAPSE_EV_FLOOR: f64 = 0.10;
pub const SAE_FIT_DATA_COLLAPSE_COST: f64 = 1.0e12;
pub const SAE_PRISTINE_SEED_EV_RETAIN_FLOOR: f64 = 0.95;
pub const SAE_FINAL_EV_DEGRADATION_TOL: f64 = 1.0e-3;
pub const SAE_SEED_DISPERSION_FLOOR: f64 = 1.0e-12;
/// Full SAE-manifold term.
/// Snapshot of exactly the mutable term state that an `apply_newton_step` +
/// `loss` line-search trial perturbs: per-atom decoder coefficients, the
/// `refresh_basis`-rebuilt basis evaluations (`basis_values`, `basis_jacobian`),
/// and the live intrinsic smoothness Gram read by the objective, plus the
/// assignment logits and latent coordinates.
///
/// Static fields (atom names, basis kinds, basis-evaluator `Arc`s, assignment
/// mode, temperature schedule) are *not* snapshotted: they are invariant across
/// an inner Newton line search, so the previous `self.clone()` per halving
/// re-copied them needlessly. Cloning only the line-search state keeps the
/// `O(N·M·d)` `basis_jacobian` copy off the per-halving hot path (one snapshot
/// before the search, one restore per rejected trial) instead of firing it on
/// every Armijo backtrack.
///
/// The canonical `smooth_penalty_raw` / `smooth_penalty_order` are static, but
/// the live intrinsic roughness Gram `smooth_penalty` is mutable state: it is
/// refreshed by assembly from the current decoder and basis Jacobian, and the
/// line-search objective reads it directly. Restoring it with the decoder and
/// basis caches keeps every rejected trial's baseline and nonlinear objective
/// on the same lagged-diffusivity quadratic.
pub