Expand description
Certified termination (#968): ONE exhaustion/stagnation policy for every damped inner loop.
§The bug genus this kills
Every hang in the tracker’s history (#874, #789, #683, #744, the survival-AFT cluster, #826’s 42-minute frozen-residual stall) traces to the same structural flaw: termination safety was a per-branch, hand-replicated convention. The #874 postmortem is the canonical specimen — the LM gain-reject branch lacked the exhaustion guard its sibling screening-reject branch in the SAME file already had. Guard drift between sibling branches is the control-flow twin of the objective↔gradient desync class, and the cure is the same: a single source of truth that branches consume and cannot locally re-derive.
§The policy pieces
madsen_can_retry / madsen_retry_exhausted own the damped-retry
exhaustion question for Madsen-style Levenberg–Marquardt loops: a retry
is alive while the damping is finite and below MADSEN_DAMPING_CAP,
and dead once attempts run out or damping leaves that window. Both
engines (reweight.rs Madsen-LM and the custom_family.rs spectral
Newton) must answer this question through these functions — never
through a local predicate.
IterationBound and RejectEscalator are the two distinct
safety mechanisms of an unbounded damped-retry loop, kept as two types
on purpose. The bound owns the per-iteration hard count: it ticks once
at the top of EVERY pass — including continue paths that neither
accept a step nor reach a reject ritual (Fisher fallback, special
cases) — and is the net that makes an unbounded loop {} safe. The
escalator owns the geometric damping discipline applied on REJECTS
only. A single type coupling “count++” to “reject” would either
double-count iterations or silently assume every non-accepting pass
reaches a reject ritual — the exact unbounded-loop hole the guard
exists to close (see the #968 thread’s design note).
FlatStreak owns the consecutive-window discipline every stagnation
detector shares: a streak that grows on “flat” readings, resets on
recovery, and fires once it spans the window. Loops that own a
scale-aware flatness predicate of their own (the custom_family
joint-Newton objective-flat counter, the blockwise frozen-loglik
divergence detector) consume it directly — they answer the question
attempt caps cannot see: a loop that still “makes progress” every
iteration but whose MERIT is frozen. #744 ran to cycle 1199/1200 at a
flat residual; #826 burned a CI timeout on a frozen joint residual. The
caller feeds its descent quantity (penalized NLL, residual norm, |g|)
through its own flatness predicate once per iteration; the streak
reports a plateau once flat readings span a consecutive window — long
before any iteration cap.
§Verdicts, not panics
Exhaustion is an escalation event: the consuming loop converts
LoopVerdict::Plateaued / LoopVerdict::Exhausted into its
honest terminal status (StalledAtValidMinimum,
LmStepSearchExhausted, …) and unwinds. Never a hang, never a panic,
never a silent wrong answer.
§Migration map (each step deleted a hand-rolled guard)
- (done) reweight.rs
lm_can_retry/lm_retry_exhaustedlocal fns + the localLM_MAX_LAMBDAconst deleted; call sites consume this module’s policy. - (done) The 7 copies of the reweight.rs reject ritual
(
loop_lambda *= factor; factor *= 2.0; continue) collapsed ontoRejectEscalator::escalate, and the per-iteration hard count moved intoIterationBound, so neither discipline can drift per-branch. - (done) custom_family.rs: the joint-Newton objective-flat counter
and the blockwise frozen-loglik divergence streak both ride
FlatStreak— the #826-class exit discipline now lives here, not in per-loop counters. The richer certificate machinery those loops layer on top (geometric-tail bound, clamped-step side condition) stays local: it is policy about what counts as flat, which the loops rightly own; the streak/window discipline is what must not fork. - (dropped) Terminal-verdict reporting into heartbeat scopes: the
[JN-EXIT]/[PIRLS]per-exit log lines already name why a loop ended; a parallel verdict channel in the process monitor would be redundant global state.
Structs§
- Flat
Streak - Consecutive-flatness streak: the window discipline shared by every stagnation detector in the tree. The caller owns the flatness predicate (scale-aware objective tolerance, frozen log-likelihood, sub-tolerance relative improvement, …); this type owns the part that historically forked per loop — grow on flat, reset on recovery, fire once the streak spans the window, and keep firing while it persists.
- Iteration
Bound - Per-iteration hard bound for a damped retry loop: the net that makes
an unbounded
loop {}safe. Tick it once at the top of EVERY pass — accepted, rejected, or anycontinuepath that reaches neither — and askIterationBound::exhausted_atwherever the loop’s exhaustion question is posed. Created fresh per outer iteration. - Reject
Escalator - Geometric damping escalator for one reject chain
(Madsen–Nielsen–Tingleff eq 3.16: the multiplier starts at 2 and
doubles on every rejection, so successive bumps are ×2, ×4, ×8, …).
Owns the factor and the reject count as one indivisible discipline —
no branch can bump the damping without advancing the schedule, the
drift mode behind #874. Deliberately does NOT own the per-iteration
count; that is
IterationBound’s job (see module docs for why the two must not be one type).
Enums§
- Loop
Verdict - Terminal verdict of a guarded loop.
Continueis the only non-terminal answer; the two terminal verdicts are ESCALATION events the consumer must convert into an honest status, never swallow.
Constants§
- MADSEN_
DAMPING_ CAP - Damping ceiling for Madsen-style LM retries. Beyond this the proposed step is numerically a zero step — retrying cannot make progress, so the retry chain is declared dead. (Moved verbatim from reweight.rs, where it was a file-local convention; see module docs for why it must be shared.)
- MADSEN_
INITIAL_ REJECT_ FACTOR - Initial damping multiplier on the first rejection of an iteration.
Doubles on every further rejection (geometric escalation), reaching
MADSEN_DAMPING_CAPfrom λ = 1 in ~12 rejections — the established reweight.rs schedule, now owned here. - PLATEAU_
DEFAULT_ WINDOW - Default consecutive-window length for a
FlatStreakstagnation detector: how many successive flat readings must accumulate before the loop is declared plateaued. Two is the established in-tree streak convention (reweight.rs soft-acceptance) — one noisy reading can fake a plateau, two consecutive cannot — plus one for the headroom a merit that is genuinely creeping (not frozen) needs to escape.
Functions§
- inner_
convergence_ is_ truthful - Convergence-truthfulness invariant for an inner-solve terminal verdict (gam#1040).
- madsen_
can_ retry - Is a damped retry still alive at this damping level?
- madsen_
retry_ exhausted - Has the retry chain exhausted its budget — by attempt count or by the damping leaving the productive window?
- slow_
geometric_ rate_ exceeds_ projection_ cap - Deterministic slow-geometric-rate stall predicate (gam#979 survival marginal-slope hang).