Skip to main content

Module structure_search

Module structure_search 

Source
Expand description

#976 — evidence-guarded dictionary structure search: atom birth / death / fission / fusion as anytime-valid hypothesis tests, with a deterministic, serializable SearchLedger as the honesty surface.

§What this is

The two documented SAE pathologies, restated statistically:

  • Feature absorption (an A⇒B hierarchy makes sparsity fold B’s content into A’s direction): an absorbing atom’s code distribution carries substructure — detectable misspecification, found by a within-atom audit and corrected by a FISSION move.
  • Feature shattering (one curved family smeared across many near-duplicate flat atoms): shattered atoms have dependent codes (gam_sae::atom_codes::CoactivationStats::dependence) and joint structure when refit together — corrected by a FUSION move.

This module owns the MOVE ENGINE: canonical deterministic proposal order, structural-hash deduplication, e-process-gated acceptance, and the ledger. It is generic over the fitter — the caller supplies the state type and four closures (apply / evaluate / null-sup / refit), exactly the surface run_atom_birth_gate already pins down. Warm structure inheritance is enforced by construction: a candidate state is built FROM the parent state (apply_move(&parent, &mv)), never from scratch — cold restarts after structure moves are both slow and collapse-prone, so the API gives them no entry point.

§Acceptance is a hypothesis test, not a threshold (#984)

The original #976 design accepted a move when Δ(neg log evidence) < −margin under the Laplace normalizer. That is the K vs K+1 boundary / Davies-regime comparison where likelihood-ratio thresholds are invalid (the null sits on the boundary of the alternative; the new atom’s parameters vanish under the null). Acceptance here is therefore routed through the universal-inference e-process gates of gam_terms::inference::structure_evidence:

  • Birth / fission / fusion each assert structure BEYOND what the current dictionary class expresses, so each runs an [AtomBirthGate] (the mechanics are claim-generic: predictable alternative, honest null sup, Ville threshold at the α fixed in MoveBudget). A move is applied only when its claim is Certified; otherwise the structure is unchanged and the claim stays Contested in the StructureLedger with its banked evidence — the input to the #984 probe-design loop.
  • Death is never certifiable, by construction. The K−1 class is nested inside the current class, so the split-likelihood e-value satisfies E ≤ 1 pointwise (the null sup dominates any sub-model fit): no amount of data can prove an atom unnecessary — only fail to prove it necessary. The demote-never-reject philosophy is therefore not a policy choice here, it is what the math leaves: a death proposal DEMOTES an atom whose AtomExists claim has never certified (trigger: diverged ARD precision), and is VETOED for a certified atom (a Ville crossing is permanent — later evidence retreat cannot un-prove existence).

§Determinism

No RNG, no clock. Proposals are sorted by the canonical order (deaths by ARD precision descending, fissions by audit significance ascending, fusions by code dependence descending, births last by proposal mass descending; ties broken by structural hash), deduplicated by the caller-computed structural hash (the TermCollectionSpec hash machinery, #869), and processed sequentially. Identical inputs ⇒ identical serialized SearchLedger — which is what keeps replicate-null comparisons (#910/#943) valid across structure changes.

The ledger reports a certified local mode: the moves explored, the evidence for accepted ones, and the evidence gaps to rejected alternatives. No global-optimality theater.

Structs§

CollapseEvent
An assignment-collapse event from the joint fit (#976 Layer-1 guard): an atom’s support fell below the active-mass floor and was either re-seeded (bounded budget) or recorded as terminally collapsed — an observable event, never a silent death and never a fit error. Terminal collapses are the natural death-proposal feed for the next search round.
MoveBudget
The search round’s budget and error level.
MoveProposal
One proposal: the move, its trigger statistic (the canonical-order key, kind-specific — see StructureMove docs), the caller-computed structural hash of the POST-move specification (dedup key), and the structural claim the move asserts (registered in the StructureLedger so the dictionary certificate covers it).
MoveRecord
One ledger line: the proposal exactly as ranked, plus its verdict.
SearchLedger
The serialized honesty surface of one search round: every proposal in canonical order with its verdict, plus any collapse events the joint fit recorded. Identical inputs produce a byte-identical serialization.
SearchOutcome
Result of one search round: the (possibly restructured) state and the ledger.

Enums§

CollapseAction
The guard’s response to an active-mass breach.
MoveVerdict
The per-proposal outcome. Every proposal handed to search gets exactly one record — the no-silent-caps rule.
StructureMove
One proposed structural move. Atom indices are STABLE IDENTIFIERS for the duration of one search round: the caller’s apply_move must not reindex surviving atoms (mark dead atoms inactive, append born atoms) — the engine relies on this to detect conflicting proposals.

Functions§

canonical_order
Sort proposals into the canonical deterministic order: kind rank (deaths, fissions, fusions, births), then the kind’s trigger direction, then structural hash. Pure — no RNG, no clock — so the search path, and with it the ledger, is a function of the inputs alone.
search
Run one evidence-guarded structure-search round.