gam_terms/inference/structure_evidence.rs
1//! Anytime-valid structure discovery: e-process gates, universal-inference
2//! atom tests, e-BH error control, and KL-optimal steering probes.
3//!
4//! Interpretability as sequential experimental design.
5//!
6//! # The thesis
7//!
8//! The dictionary-learning stack (#974–#981) DISCOVERS structure: atom
9//! birth/death/fission/fusion (#976), geometry adjudication — circle vs
10//! clusters vs line (#907), feature binding (#975). Today those decisions
11//! are made by evidence heuristics (likelihood-ratio ladders, BIC-flavored
12//! gates). Three facts, never previously combined, say that is not merely
13//! informal but WRONG in a specific, fixable way — and that fixing it
14//! upgrades the whole capstone from observational description to
15//! error-controlled experimental science:
16//!
17//! 1. **Atom existence is a NON-REGULAR testing problem.** "Does a K+1-th
18//! dictionary atom exist?" is testing K vs K+1 mixture components — the
19//! textbook boundary/loss-of-identifiability case where the classical
20//! likelihood-ratio χ² asymptotics FAIL (the null sits on the boundary
21//! of the alternative; the nuisance parameters of the new atom vanish
22//! under the null — Davies' problem). Every SAE/dictionary paper that
23//! thresholds a likelihood improvement is running this broken test.
24//! **Universal inference** (Wasserman–Ramdas–Balakrishnan 2020) is the
25//! modern resolution: a split-likelihood-ratio e-value that is valid in
26//! finite samples with NO regularity conditions whatsoever — exactly
27//! the irregular regime atom birth lives in.
28//!
29//! 2. **Discovery happens on streams with optional stopping.** Dictionaries
30//! are trained until the features "look right" — data-dependent
31//! stopping that p-hacks any fixed-sample test by construction.
32//! **E-processes** (nonnegative supermartingales under the null,
33//! `E[E_τ] ≤ 1` at every stopping time) are immune: by Ville's
34//! inequality `P(sup_t E_t ≥ 1/α) ≤ α`, the guarantee survives stopping
35//! whenever you like, peeking included, streaming corpora (#973)
36//! included. Evidence is a RUNNING PRODUCT, resumable across shards.
37//!
38//! 3. **This laboratory is INTERVENTIONAL.** The steering primitive with
39//! output dosimetry and a validity radius is landed
40//! (`crate::inference::steering`); the per-token output-Fisher harvest
41//! (#980) gives the local information geometry of the model's output.
42//! So "which probe next?" is not a vibe — it is OPTIMAL EXPERIMENTAL
43//! DESIGN: choose the steering intervention that maximizes the expected
44//! log-growth of the e-process deciding a contested structural claim.
45//! Under the local Gaussian output-Fisher model, that growth rate IS a
46//! KL divergence with a closed form (below). The model chooses its own
47//! next experiment, optimally, inside its certified validity radius.
48//!
49//! The deliverable shape: **every discovered atom ships an e-value; every
50//! dictionary ships an e-BH FDR certificate over its claimed structure;
51//! contested claims get design-optimal probes until evidence resolves.**
52//! No other interpretability stack has finite-sample, optional-stopping-
53//! safe error control over its discovered structure. This module is the
54//! statistical substrate; the SAE structure search plugs its gates in.
55//!
56//! The instruments, bottom-up: [`EProcess`] (running evidence, Ville
57//! semantics) → [`PredictablePluginEProcess`] (streaming universal
58//! inference) → [`AtomBirthGate`] + [`run_atom_birth_gate`] (the K vs K+1
59//! gate with demote-never-reject [`GateVerdict`]s; the runner enforces
60//! the predictability contract by call order) → [`StructureLedger`] (one
61//! e-process per claim, serializable across #973 shards) →
62//! [`StructureLedger::certify`] (the e-BH [`StructureCertificate`],
63//! shipped beside the gauge report via
64//! `crate::terms::sae::identifiability::dictionary_report`) →
65//! [`plan_probe_for_contested_claim`] (the design loop: contested claims
66//! get a [`ProbePlan`] whose δ runs through
67//! `crate::inference::steering::steer_delta` and whose per-hypothesis
68//! μ₀/μ₁ come from `crate::inference::steering::predicted_response`).
69//!
70//! # The math, fixed here so implementations cannot drift
71//!
72//! **E-value / e-process.** A nonnegative statistic `E` with `E_{H0}[E] ≤ 1`.
73//! Calibration to tests: reject at level α when `E ≥ 1/α` (Markov). An
74//! e-PROCESS compounds multiplicatively: `E_t = Π_{s≤t} e_s` where each
75//! `e_s` is conditionally valid given the past (`E[e_s | F_{s−1}] ≤ 1`
76//! under H0). Ville: `P_{H0}(∃t: E_t ≥ 1/α) ≤ α` — anytime validity.
77//! Always accumulate in log space; evidence products underflow doubles.
78//!
79//! **Universal inference (the atom-birth test).** Split the data (or the
80//! token stream) into D₀ (evaluation) and D₁ (estimation). Fit the
81//! K+1-atom alternative on D₁ by ANY method (the production fitter, warm
82//! starts, GPU, anything — no conditions). Fit the K-atom null by
83//! CONSTRAINED MLE ON D₀ (this is the one side that must be honest). Then
84//!
85//! ```text
86//! E = L(θ̂₁ ; D₀) / sup_{θ ∈ H0} L(θ ; D₀)
87//! ```
88//!
89//! satisfies `E_{H0}[E] ≤ 1` in finite samples, mixtures and boundaries
90//! and all (the proof is three lines of Markov + tower; no asymptotics).
91//! Sequential version: at each batch t, the alternative plug-in is fit on
92//! data BEFORE t (predictable), the null sup is over the batch; the
93//! product is an e-process. This is `SplitLikelihoodEValue` /
94//! `PredictablePluginEProcess` below, generic over log-likelihood
95//! closures so the SAE stack passes its own (manifold likelihoods,
96//! superposition-aware residual models #974, whatever exists then).
97//!
98//! **Bayes factors are e-values (the #907 bridge).** A Bayes factor
99//! `BF = ∫ L(θ) dΠ₁(θ) / L_{H0}` with a FIXED (data-independent) prior Π₁
100//! and a SIMPLE (or sup-dominated) null has `E_{H0}[BF] ≤ 1` — the #907
101//! geometry-adjudication harness (circle vs clusters vs line, with its
102//! discrete-mixture null) is therefore ONE PRIOR-FREEZE away from anytime
103//! validity. The integration contract: route its per-batch BFs through
104//! [`EProcess::absorb`] instead of comparing a final BF to a threshold,
105//! and geometry claims inherit optional-stopping safety for free.
106//!
107//! **e-BH (the dictionary certificate).** Given e-values e_1..e_m for m
108//! structural claims (one per atom/edge/binding), sort descending and
109//! reject the top k* where `k* = max{ k : e_(k) ≥ m/(α·k) }`. This
110//! controls FDR ≤ α under ARBITRARY dependence between the e-values
111//! (Wang–Ramdas 2022) — no independence assumptions about atoms sharing
112//! tokens, which is good because they all share every token. That
113//! arbitrary-dependence robustness is WHY the certificate uses e-BH and
114//! not a p-value BH: the p-version needs PRDS, which atom statistics
115//! flagrantly violate.
116//!
117//! **Design-optimal probing.** For a contested claim with competing
118//! structural hypotheses H₀/H₁ (e.g. "feature f is one curved atom" vs
119//! "two flat atoms"), a candidate steering intervention δ (within its
120//! certified validity radius, `steering.rs`) produces predicted output
121//! distributions P₀^δ, P₁^δ. The expected per-observation log-growth of
122//! the likelihood-ratio e-process under H₁ is exactly `KL(P₁^δ ‖ P₀^δ)`,
123//! so the optimal next experiment is
124//!
125//! ```text
126//! δ* = argmax_{δ : ‖δ‖ ≤ r_valid} KL(P₁^δ ‖ P₀^δ)
127//! ≈ argmax_δ ½ (μ₁(δ) − μ₀(δ))ᵀ F (μ₁(δ) − μ₀(δ))
128//! ```
129//!
130//! under the local Gaussian model with output-Fisher metric F (#980
131//! harvest) — the SAME quadratic form the steering dosimetry already
132//! computes, repurposed: dosimetry measures nats delivered, design
133//! maximizes nats of DISCRIMINATION. Probes that maximize raw effect are
134//! not optimal; probes that maximize the *disagreement between the
135//! hypotheses' predictions*, weighted by output information, are. (Full
136//! KL-optimal design over the probe manifold is a research arc; the greedy
137//! quadratic rule below is the correct first instrument and is exact in
138//! the local model.)
139//!
140//! # What this kills
141//!
142//! - Birth/death gates that are threshold heuristics → replaced by
143//! anytime-valid tests with declared error rates (#976's "detectable,
144//! correctable misspecification" becomes a literal hypothesis test).
145//! - "We trained until the features looked interpretable" → optional
146//! stopping is SAFE; the certificate survives it.
147//! - "We found N features" → "we found N features at FDR ≤ α, certificate
148//! attached, reproducible from the e-value ledger."
149//! - Probe selection by intuition → probe selection by information.
150
151use ndarray::{Array1, Array2};
152use serde::{Deserialize, Serialize};
153
154/// Running anytime-valid evidence against one null hypothesis, in log
155/// space. Multiplicative absorption of conditionally-valid e-values;
156/// Ville's inequality converts the running product into a sequential test
157/// that survives optional stopping. Serializable so evidence is resumable
158/// across corpus shards (#973): persist, reload, keep absorbing.
159#[derive(Clone, Debug, Serialize, Deserialize)]
160pub struct EProcess {
161 /// log E_t — the running log-evidence. Starts at 0 (E_0 = 1).
162 log_e: f64,
163 /// Number of absorbed batches (the ledger length).
164 steps: usize,
165 /// Running maximum of log E_t — Ville's inequality applies to the
166 /// supremum, so a claim once proven at level α stays proven even if
167 /// later evidence retreats (evidence is not p-hackable in reverse).
168 log_e_max: f64,
169}
170
171impl EProcess {
172 pub fn new() -> Self {
173 Self {
174 log_e: 0.0,
175 steps: 0,
176 log_e_max: 0.0,
177 }
178 }
179
180 /// Absorb one conditionally-valid e-value (NOT in log space; must be
181 /// ≥ 0; `E[e | past] ≤ 1` under H0 is the caller's contract — e.g. a
182 /// universal-inference batch ratio or a fixed-prior Bayes factor).
183 pub fn absorb(&mut self, e_value: f64) -> Result<(), String> {
184 if e_value.is_nan() || e_value < 0.0 {
185 return Err(format!("e-value must be in [0, ∞], got {e_value}"));
186 }
187 self.absorb_log(e_value.ln())
188 }
189
190 /// Absorb a batch e-value supplied in log space (the only numerically
191 /// honest interface for long streams).
192 pub fn absorb_log(&mut self, log_e_value: f64) -> Result<(), String> {
193 let next_log_e = checked_log_e_sum(self.log_e, log_e_value)?;
194 self.log_e = next_log_e;
195 self.steps += 1;
196 if self.log_e > self.log_e_max {
197 self.log_e_max = self.log_e;
198 }
199 Ok(())
200 }
201
202 pub fn log_evidence(&self) -> f64 {
203 self.log_e
204 }
205
206 pub fn steps(&self) -> usize {
207 self.steps
208 }
209
210 /// Anytime-valid rejection at level α: by Ville,
211 /// `P_{H0}(sup_t E_t ≥ 1/α) ≤ α`, so crossing 1/α at ANY time —
212 /// including data-dependent stopping times — proves the claim with
213 /// type-I error ≤ α. Uses the running supremum: once crossed, always
214 /// rejected.
215 pub fn rejects_at(&self, alpha: f64) -> bool {
216 alpha > 0.0 && self.log_e_max >= -(alpha.ln())
217 }
218
219 /// The e-value to hand to [`e_benjamini_hochberg`] for the
220 /// dictionary-level FDR certificate (current evidence, not the sup —
221 /// e-BH's guarantee is stated for e-values at the chosen stopping
222 /// time).
223 pub fn current_e_value_log(&self) -> f64 {
224 self.log_e
225 }
226}
227
228impl Default for EProcess {
229 fn default() -> Self {
230 Self::new()
231 }
232}
233
234fn checked_log_e_sum(current: f64, increment: f64) -> Result<f64, String> {
235 if current.is_nan() {
236 return Err("EProcess invariant violation: current log evidence is NaN".to_string());
237 }
238 if increment.is_nan() {
239 return Err("log e-value must not be NaN".to_string());
240 }
241 if current.is_infinite()
242 && increment.is_infinite()
243 && current.is_sign_positive() != increment.is_sign_positive()
244 {
245 return Err(format!(
246 "cannot combine opposing infinite log e-values: current {current}, increment {increment}"
247 ));
248 }
249 Ok(current + increment)
250}
251
252/// One universal-inference (split-likelihood-ratio) e-value: finite-sample
253/// valid with NO regularity conditions — the correct instrument for atom
254/// birth (K vs K+1 components, boundary/Davies regime where χ² fails).
255///
256/// `log_lik_alternative_on_eval`: log-likelihood of the EVALUATION fold
257/// under the alternative fitted on the ESTIMATION fold (any fitter — the
258/// production manifold/SAE fit, warm-started, GPU; zero conditions on it).
259/// `log_lik_null_sup_on_eval`: the SUPREMUM of the evaluation-fold
260/// log-likelihood over the NULL model class (the honest side: a real
261/// constrained fit on the eval fold, e.g. the K-atom dictionary refit on
262/// D₀). Then `log E = ℓ_alt(D₀) − sup_{H0} ℓ(D₀)` and `E_{H0}[E] ≤ 1`
263/// exactly.
264pub fn split_likelihood_log_e_value(
265 log_lik_alternative_on_eval: f64,
266 log_lik_null_sup_on_eval: f64,
267) -> f64 {
268 // Degenerate halves yield a well-defined, conservative e-value of 1
269 // (`log E = 0`, no evidence for the alternative) rather than a NaN that
270 // would poison the e-process product and the downstream FDR certificate:
271 // * Either log-likelihood is NaN — the model could not be evaluated on
272 // this shard (a numerically degenerate fit). The honest reading is
273 // "no information", i.e. `E = 1`.
274 // * Both halves are the SAME signed infinity — e.g. a shard/outcome
275 // with zero density under both the alternative and the null
276 // (`−∞ − (−∞)`), or both `+∞`. The ratio is undefined; again `E = 1`.
277 // This keeps `log E` finite so `absorb_log` never banks a NaN and the
278 // e-BH certificate sees a contributing-nothing claim instead of panicking.
279 if log_lik_alternative_on_eval.is_nan() || log_lik_null_sup_on_eval.is_nan() {
280 return 0.0;
281 }
282 if log_lik_alternative_on_eval.is_infinite()
283 && log_lik_null_sup_on_eval.is_infinite()
284 && log_lik_alternative_on_eval.is_sign_positive()
285 == log_lik_null_sup_on_eval.is_sign_positive()
286 {
287 return 0.0;
288 }
289 log_lik_alternative_on_eval - log_lik_null_sup_on_eval
290}
291
292/// Sequential universal inference over a stream of batches with a
293/// PREDICTABLE plug-in: at batch t the alternative parameters were fit
294/// using only data before t, the null sup is taken on batch t, and the
295/// per-batch ratios compound into an e-process. This is the streaming /
296/// optional-stopping form the corpus-scale pipeline (#973 shards) needs —
297/// evidence is resumable: serialize `EProcess`, keep absorbing on the next
298/// shard.
299#[derive(Clone, Debug, Serialize, Deserialize)]
300pub struct PredictablePluginEProcess {
301 pub process: EProcess,
302}
303
304impl PredictablePluginEProcess {
305 pub fn new() -> Self {
306 Self {
307 process: EProcess::new(),
308 }
309 }
310
311 /// Absorb one batch. The caller guarantees the alternative was fit
312 /// WITHOUT batch-t data (predictability — this is what makes the
313 /// product a supermartingale; violating it voids the guarantee, which
314 /// is why the SAE integration must hand this function the PREVIOUS
315 /// shard's fitted dictionary, never the current one).
316 pub fn try_absorb_batch(
317 &mut self,
318 log_lik_alternative_prefit: f64,
319 log_lik_null_sup_on_batch: f64,
320 ) -> Result<(), String> {
321 self.process.absorb_log(split_likelihood_log_e_value(
322 log_lik_alternative_prefit,
323 log_lik_null_sup_on_batch,
324 ))
325 }
326}
327
328impl Default for PredictablePluginEProcess {
329 fn default() -> Self {
330 Self::new()
331 }
332}
333
334/// The anytime-valid verdict on one structural claim. Deliberately
335/// two-valued — there is NO "rejected" arm. Demote-never-reject (#969
336/// philosophy): an e-process that has not crossed 1/α has failed to prove
337/// the claim, not disproven it; the claim stays contested, keeps its
338/// evidence, and earns a design-optimal probe budget instead of being
339/// silently dropped (or worse, silently accepted the way a threshold gate
340/// accepts whatever clears it).
341#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
342pub enum GateVerdict {
343 /// The running supremum crossed 1/α: the claim is proven with type-I
344 /// error ≤ α, permanently (Ville applies to the sup, so later evidence
345 /// retreat cannot un-prove it).
346 Certified { log_e: f64 },
347 /// Not (yet) proven. Carries the CURRENT log-evidence — the value the
348 /// dictionary certificate's e-BH consumes, and the state a probe loop
349 /// resumes from.
350 Contested { log_e: f64 },
351}
352
353/// The atom-birth gate (#976's threshold comparison, replaced): a
354/// universal-inference e-process over corpus shards deciding "does the
355/// K+1-th atom exist?", the boundary/Davies-regime question where the χ²
356/// gate every dictionary paper runs is broken.
357///
358/// Per shard t the integration contract is exactly the work plan's:
359/// - `log_lik_alternative_prefit`: the K+1-atom dictionary fit on shards
360/// BEFORE t (the PREVIOUS shard's fit — predictability is the one rule;
361/// handing in the current shard's fit voids the guarantee), evaluated on
362/// shard t. Any fitter, warm starts, GPU — no conditions.
363/// - `log_lik_null_sup_on_shard`: the K-atom dictionary REFIT on shard t
364/// (the honest constrained sup on the evaluation data).
365///
366/// The gate never rejects: [`GateVerdict::Contested`] is the only
367/// alternative to certification, and a contested atom's next move is a
368/// probe plan ([`plan_probe_for_contested_claim`]), not deletion.
369#[derive(Clone, Debug, Serialize, Deserialize)]
370pub struct AtomBirthGate {
371 pub test: PredictablePluginEProcess,
372 /// The level the certificate is claimed at; fixed at construction so a
373 /// verdict can never be shopped across α after seeing the evidence.
374 alpha: f64,
375}
376
377impl AtomBirthGate {
378 pub fn new(alpha: f64) -> Result<Self, String> {
379 if !(alpha > 0.0 && alpha < 1.0) {
380 return Err(format!(
381 "AtomBirthGate: alpha must be in (0,1), got {alpha}"
382 ));
383 }
384 Ok(Self {
385 test: PredictablePluginEProcess::new(),
386 alpha,
387 })
388 }
389
390 pub fn alpha(&self) -> f64 {
391 self.alpha
392 }
393
394 /// Absorb one shard's split-likelihood ratio (see type-level contract).
395 pub fn try_absorb_shard(
396 &mut self,
397 log_lik_alternative_prefit: f64,
398 log_lik_null_sup_on_shard: f64,
399 ) -> Result<(), String> {
400 self.test
401 .try_absorb_batch(log_lik_alternative_prefit, log_lik_null_sup_on_shard)
402 }
403
404 pub fn absorb_shard(
405 &mut self,
406 log_lik_alternative_prefit: f64,
407 log_lik_null_sup_on_shard: f64,
408 ) {
409 self.try_absorb_shard(log_lik_alternative_prefit, log_lik_null_sup_on_shard)
410 .expect("AtomBirthGate received invalid log evidence");
411 }
412
413 pub fn verdict(&self) -> GateVerdict {
414 if self.test.process.rejects_at(self.alpha) {
415 GateVerdict::Certified {
416 log_e: self.test.process.log_evidence(),
417 }
418 } else {
419 GateVerdict::Contested {
420 log_e: self.test.process.log_evidence(),
421 }
422 }
423 }
424}
425
426/// Run the atom-birth gate over a shard stream with the predictability
427/// contract enforced BY CONSTRUCTION: on each shard the alternative is
428/// evaluated strictly before it is refit with that shard, so the plug-in
429/// is always predictable and the product is always a supermartingale
430/// under H0. This is the orchestration the SAE structure search calls;
431/// the closures are the only fitter-specific surface.
432///
433/// - `alternative_log_lik(alt, shard)`: evaluation-fold log-likelihood of
434/// shard under the CURRENT alternative state (fit on prior shards only —
435/// guaranteed here by call order).
436/// - `null_sup_log_lik(shard)`: the honest constrained sup — the K-atom
437/// null REFIT on this shard (any fitter; this side must genuinely
438/// maximize over H0 on the shard, an under-maximized null inflates the
439/// e-value and voids validity).
440/// - `refit_alternative(alt, shard)`: fold the shard into the alternative
441/// (warm-started production fit, GPU, anything — zero conditions).
442///
443/// `initial_alternative` is the K+1 fit from data BEFORE the stream (or a
444/// prior-driven init; validity never depends on its quality — a bad init
445/// only costs power). Stops absorbing early once certified (the crossing
446/// is permanent; further shards only cost compute), but still folds the
447/// remaining shards into the alternative so the returned state has seen
448/// the whole stream. Returns the gate (verdict + resumable evidence) and
449/// the final alternative state.
450pub fn run_atom_birth_gate<S, A>(
451 alpha: f64,
452 initial_alternative: A,
453 shards: impl IntoIterator<Item = S>,
454 mut alternative_log_lik: impl FnMut(&A, &S) -> f64,
455 mut null_sup_log_lik: impl FnMut(&S) -> f64,
456 mut refit_alternative: impl FnMut(A, &S) -> A,
457) -> Result<(AtomBirthGate, A), String> {
458 let mut gate = AtomBirthGate::new(alpha)?;
459 let mut alt = initial_alternative;
460 for shard in shards {
461 if !matches!(gate.verdict(), GateVerdict::Certified { .. }) {
462 let log_lik_alt = alternative_log_lik(&alt, &shard);
463 let log_lik_null = null_sup_log_lik(&shard);
464 gate.try_absorb_shard(log_lik_alt, log_lik_null)?;
465 }
466 alt = refit_alternative(alt, &shard);
467 }
468 Ok((gate, alt))
469}
470
471/// e-BH: FDR control over m structural claims under ARBITRARY dependence
472/// (Wang–Ramdas). Input: per-claim log-e-values. Output: indices of
473/// rejected (i.e. CONFIRMED-STRUCTURE) claims, FDR ≤ α.
474///
475/// Sort e-values descending; reject the top k* where
476/// `k* = max{ k : e_(k) ≥ m/(α·k) }`.
477///
478/// This is the "dictionary certificate": run one e-process per claimed
479/// atom (and per claimed binding edge, #975), call this at the chosen
480/// stopping time, and the dictionary ships with an FDR-controlled list of
481/// which of its claimed structures are statistically real. No
482/// independence assumptions — atoms sharing every token is fine; that is
483/// exactly the case p-value BH cannot legally handle (PRDS violation) and
484/// e-BH can.
485pub fn e_benjamini_hochberg(log_e_values: &[f64], alpha: f64) -> Vec<usize> {
486 let m = log_e_values.len();
487 if m == 0 || !(alpha.is_finite() && alpha > 0.0) {
488 return Vec::new();
489 }
490 // Robustness: a degenerate claim can bank a non-finite log e-value (a
491 // NaN from an upstream `(−∞) − (−∞)` split-LR that escaped the source
492 // guard). This is the documented honest FDR surface, so it must NEVER
493 // panic on such input. Treat any NaN as least-evidence (`−∞`): an
494 // undefined/contested claim contributes no evidence, sorts last, and can
495 // never be among the rejections. Finite/±∞ values pass through unchanged;
496 // `total_cmp` then gives a total order on the sanitized keys.
497 let sanitized: Vec<f64> = log_e_values
498 .iter()
499 .map(|&v| if v.is_nan() { f64::NEG_INFINITY } else { v })
500 .collect();
501 let mut order: Vec<usize> = (0..m).collect();
502 order.sort_by(|&a, &b| sanitized[b].total_cmp(&sanitized[a]));
503 let m_f = m as f64;
504 let mut k_star = 0usize;
505 for (rank0, &idx) in order.iter().enumerate() {
506 let k = (rank0 + 1) as f64;
507 // e_(k) ≥ m / (α k) ⟺ log e_(k) ≥ log m − log α − log k
508 if sanitized[idx] >= m_f.ln() - alpha.ln() - k.ln() {
509 k_star = rank0 + 1;
510 }
511 }
512 order.truncate(k_star);
513 order
514}
515
516/// What one structural claim asserts about the dictionary. One e-process
517/// runs per claim; the kinds mirror the discovery stack's claim surface:
518/// atom existence (#976 birth), binding edges (#975), geometry
519/// adjudication (#907). `Custom` keeps the ledger open to claim types
520/// that do not exist yet without an enum churn per new discovery gate.
521#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)]
522pub enum ClaimKind {
523 /// "Atom `atom` is statistically real" — the K vs K+1 birth claim.
524 AtomExists { atom: usize },
525 /// "Atoms `a` and `b` are bound" — a #975 binding edge.
526 BindingEdge { a: usize, b: usize },
527 /// "Atom `atom`'s latent geometry is `kind`" (e.g. "circle",
528 /// "clusters", "line") — a #907 adjudication claim.
529 GeometryKind { atom: usize, kind: String },
530 /// Any other structural claim, labeled.
531 Custom { label: String },
532}
533
534/// One claim plus its running evidence.
535#[derive(Clone, Debug, Serialize, Deserialize)]
536pub struct StructuralClaim {
537 pub kind: ClaimKind,
538 pub evidence: EProcess,
539}
540
541/// The dictionary's claim ledger: every structural claim the discovery
542/// stack makes, each with its own e-process. Serializable — evidence
543/// resumes across corpus shards (#973) by persisting the ledger, not by
544/// refitting. Calling [`StructureLedger::certify`] at ANY data-dependent
545/// stopping time yields a valid certificate; that is the entire point.
546#[derive(Clone, Debug, Default, Serialize, Deserialize)]
547pub struct StructureLedger {
548 claims: Vec<StructuralClaim>,
549}
550
551impl StructureLedger {
552 pub fn new() -> Self {
553 Self { claims: Vec::new() }
554 }
555
556 /// Register a claim and return its ledger index. Idempotent on the
557 /// claim kind: re-registering an existing claim (a resumed shard loop
558 /// re-announcing its claim surface) returns the existing index and
559 /// PRESERVES its accumulated evidence — a fresh e-process here would
560 /// silently discard the stream's history.
561 pub fn register(&mut self, kind: ClaimKind) -> usize {
562 if let Some(idx) = self.claims.iter().position(|c| c.kind == kind) {
563 return idx;
564 }
565 self.claims.push(StructuralClaim {
566 kind,
567 evidence: EProcess::new(),
568 });
569 self.claims.len() - 1
570 }
571
572 /// Absorb one conditionally-valid log e-value for claim `idx` (a
573 /// universal-inference shard ratio, a frozen-prior log-BF — the
574 /// caller's contract is per-source, documented on the producing gate).
575 pub fn absorb_log(&mut self, idx: usize, log_e_value: f64) -> Result<(), String> {
576 let n = self.claims.len();
577 let claim = self.claims.get_mut(idx).ok_or_else(|| {
578 format!("StructureLedger: claim index {idx} out of range ({n} claims)")
579 })?;
580 claim.evidence.absorb_log(log_e_value)
581 }
582
583 pub fn claims(&self) -> &[StructuralClaim] {
584 &self.claims
585 }
586
587 /// The likelihood half of the probe-design loop (work-plan step 4):
588 /// after running a planned probe ([`ProbePlan`] →
589 /// `crate::inference::steering::steer_delta`), evaluate the REALIZED
590 /// outcomes under both hypotheses' predictive densities and absorb the
591 /// log-ratio into the contested claim's e-process.
592 ///
593 /// Validity contract: both predictive densities must be FROZEN before the
594 /// probe outcome is observed — which the design loop satisfies by
595 /// construction, since both hypotheses' dictionaries were fitted before
596 /// the probe was even chosen. For a composite null, the null density must
597 /// be the honest constrained fit (the same rule as
598 /// [`split_likelihood_log_e_value`], which this delegates to); for a
599 /// simple null the predictive density is the sup. Probe outcomes are new
600 /// data by construction (the model was steered to produce them), so they
601 /// compound validly with the claim's prior shard evidence.
602 pub fn absorb_probe_outcome(
603 &mut self,
604 idx: usize,
605 log_lik_alt_on_outcome: f64,
606 log_lik_null_on_outcome: f64,
607 ) -> Result<(), String> {
608 self.absorb_log(
609 idx,
610 split_likelihood_log_e_value(log_lik_alt_on_outcome, log_lik_null_on_outcome),
611 )
612 }
613
614 /// The dictionary certificate: e-BH over the ledger's CURRENT
615 /// e-values at level α. FDR ≤ α over the confirmed set under arbitrary
616 /// dependence — atoms sharing every token is fine — and valid at any
617 /// stopping time because each entry is an e-process. Claims not
618 /// confirmed are CONTESTED, never rejected (demote-never-reject); they
619 /// keep their evidence and are the inputs to the probe-design loop.
620 pub fn certify(&self, alpha: f64) -> StructureCertificate {
621 let log_e: Vec<f64> = self
622 .claims
623 .iter()
624 .map(|c| c.evidence.current_e_value_log())
625 .collect();
626 let confirmed_idx = e_benjamini_hochberg(&log_e, alpha);
627 let mut entries: Vec<CertificateEntry> = self
628 .claims
629 .iter()
630 .zip(&log_e)
631 .map(|(c, &le)| CertificateEntry {
632 kind: c.kind.clone(),
633 log_e: le,
634 steps: c.evidence.steps(),
635 confirmed: false,
636 })
637 .collect();
638 for &i in &confirmed_idx {
639 entries[i].confirmed = true;
640 }
641 StructureCertificate { alpha, entries }
642 }
643}
644
645/// One line of the certificate's e-value ledger: the claim, its
646/// log-evidence at the stop, how many batches produced it, and the e-BH
647/// outcome. The full entry list IS the reproducibility artifact: anyone
648/// holding it can re-run [`e_benjamini_hochberg`] and re-derive the
649/// confirmed set.
650#[derive(Clone, Debug, Serialize, Deserialize)]
651pub struct CertificateEntry {
652 pub kind: ClaimKind,
653 pub log_e: f64,
654 pub steps: usize,
655 pub confirmed: bool,
656}
657
658/// The deliverable: "we found N structures at FDR ≤ α, certificate
659/// attached". Ships next to the identifiability certificate
660/// ([`crate::terms::sae::identifiability::residual_gauge`], #981) — that one says
661/// what the GAUGE cannot distinguish, this one says what the DATA can.
662#[derive(Clone, Debug, Serialize, Deserialize)]
663pub struct StructureCertificate {
664 pub alpha: f64,
665 pub entries: Vec<CertificateEntry>,
666}
667
668impl StructureCertificate {
669 pub fn confirmed(&self) -> impl Iterator<Item = &CertificateEntry> {
670 self.entries.iter().filter(|e| e.confirmed)
671 }
672
673 pub fn contested(&self) -> impl Iterator<Item = &CertificateEntry> {
674 self.entries.iter().filter(|e| !e.confirmed)
675 }
676}
677
678/// Calibrate one (super)uniform p-value into a single e-value, in log
679/// space: `e(p) = ½ p^{−1/2}` (the κ = ½ member of the calibrator family
680/// `e_κ(p) = κ p^{κ−1}`; `∫₀¹ e_κ(p) dp = 1`, so `E_{H0}[e(P)] ≤ 1` for
681/// any valid p — superuniformity only, no other conditions).
682///
683/// This is the bridge from p-value-shaped instruments into the ledger —
684/// e.g. the feature-binding Wald test (`terms::structure::anova_atom::carve`'s
685/// `edge_p_value` → a [`ClaimKind::BindingEdge`] entry). It spends
686/// calibration slack (a p of 0.01 becomes e = 5, not 100), which is the
687/// honest price of converting a fixed-sample test into anytime-valid
688/// currency; instruments that can produce e-values natively should.
689/// CONTRACT: one calibrated e-value per INDEPENDENT data batch — feeding
690/// repeated tests of the same accumulating data into one e-process is the
691/// p-hacking this module exists to kill.
692pub fn log_e_from_p_calibrator(p_value: f64) -> Result<f64, String> {
693 if !(p_value > 0.0) || p_value > 1.0 {
694 return Err(format!("p-value must be in (0, 1], got {p_value}"));
695 }
696 Ok(0.5f64.ln() - 0.5 * p_value.ln())
697}
698
699/// A candidate steering probe for resolving one contested structural
700/// claim: the intervention direction (in the steering primitive's
701/// coordinates), and the two hypotheses' PREDICTED output-mean responses
702/// to it.
703pub struct CandidateProbe {
704 /// Steering displacement δ, to be applied via
705 /// `crate::inference::steering` (which enforces its own validity
706 /// radius and reports realized dosimetry).
707 pub delta: Array1<f64>,
708 /// Predicted output-mean response under the null structure, μ₀(δ).
709 pub predicted_mean_null: Array1<f64>,
710 /// Predicted output-mean response under the alternative, μ₁(δ).
711 pub predicted_mean_alt: Array1<f64>,
712}
713
714/// Greedy KL-optimal experimental design under the local Gaussian
715/// output-Fisher model: pick the probe maximizing
716/// `½ (μ₁(δ) − μ₀(δ))ᵀ F (μ₁(δ) − μ₀(δ))` — the expected per-observation
717/// log-growth of the deciding e-process under the alternative.
718///
719/// `fisher` is the output-Fisher metric at the operating point (#980
720/// harvest; the same object steering dosimetry contracts against). Probes
721/// whose hypotheses predict the SAME response score zero no matter how
722/// large their raw effect — the design rule selects for DISCRIMINATION,
723/// not impact, which is the entire point: a maximally-steered output that
724/// both hypotheses predict identically teaches nothing.
725///
726/// Returns the index of the best probe and its expected log-growth (nats
727/// per observation), or None if no probe discriminates.
728pub fn select_probe_by_expected_evidence(
729 probes: &[CandidateProbe],
730 fisher: &Array2<f64>,
731) -> Option<(usize, f64)> {
732 let mut best: Option<(usize, f64)> = None;
733 for (idx, probe) in probes.iter().enumerate() {
734 let diff = &probe.predicted_mean_alt - &probe.predicted_mean_null;
735 if diff.len() != fisher.nrows() {
736 continue;
737 }
738 let f_diff = fisher.dot(&diff);
739 let growth = 0.5 * diff.dot(&f_diff);
740 if growth.is_finite() && growth > 0.0 {
741 match best {
742 Some((_, g)) if g >= growth => {}
743 _ => best = Some((idx, growth)),
744 }
745 }
746 }
747 best
748}
749
750/// Expected number of observations for the chosen probe to push a claim's
751/// e-process across the 1/α Ville threshold, under the alternative: the
752/// design-time budget `log(1/α) / growth_rate`. This is what turns the
753/// abstract guarantee into an experiment plan ("this probe should resolve
754/// the claim in ~N tokens; if it hasn't, the alternative is weaker than
755/// hypothesized — itself evidence").
756pub fn expected_resolution_budget(alpha: f64, growth_nats_per_obs: f64) -> Option<f64> {
757 if alpha <= 0.0 || alpha >= 1.0 || growth_nats_per_obs <= 0.0 {
758 return None;
759 }
760 Some(-(alpha.ln()) / growth_nats_per_obs)
761}
762
763/// The experiment plan for one contested claim: which probe to run, the
764/// expected per-observation evidence growth under the alternative, and the
765/// design-time resolution budget. This is the loop's actionable output —
766/// hand `probes[probe]`'s δ to `crate::inference::steering::steer_delta`
767/// (which enforces the validity radius and reports realized dosimetry),
768/// evaluate both hypotheses' likelihoods on the realized outputs, absorb
769/// the log-ratio into the claim's e-process, re-certify.
770#[derive(Clone, Debug, PartialEq)]
771pub struct ProbePlan {
772 /// Index into the candidate probe list.
773 pub probe: usize,
774 /// Expected log-growth of the deciding e-process, nats/observation,
775 /// under the alternative (the KL of the hypotheses' predicted
776 /// responses in the output-Fisher metric).
777 pub expected_log_growth: f64,
778 /// Expected observations to cross 1/α from ZERO evidence — the
779 /// conservative from-scratch budget.
780 pub budget_from_scratch: f64,
781 /// Expected observations to cross 1/α from the claim's CURRENT
782 /// log-evidence — the remaining budget; 0 when already across.
783 pub budget_remaining: f64,
784}
785
786/// Close the design loop for one contested claim: pick the probe whose
787/// predicted hypothesis-disagreement (not raw effect) buys evidence
788/// fastest, and convert the claim's current evidence into a remaining
789/// budget — "this probe should resolve the claim in ~N more observations
790/// at level α; if it does not, the alternative is weaker than
791/// hypothesized, which is itself evidence."
792///
793/// `current_log_e` is the contested claim's running log-evidence (from its
794/// [`StructuralClaim`] / [`GateVerdict::Contested`]). Returns `None` when
795/// no probe discriminates (all candidates score zero growth: the
796/// hypotheses agree on everything reachable inside the validity radius —
797/// the claim is undecidable by steering and needs a different instrument,
798/// which is a finding, not a failure).
799pub fn plan_probe_for_contested_claim(
800 probes: &[CandidateProbe],
801 fisher: &Array2<f64>,
802 alpha: f64,
803 current_log_e: f64,
804) -> Option<ProbePlan> {
805 let (probe, expected_log_growth) = select_probe_by_expected_evidence(probes, fisher)?;
806 let budget_from_scratch = expected_resolution_budget(alpha, expected_log_growth)?;
807 let nats_remaining = (-(alpha.ln()) - current_log_e).max(0.0);
808 Some(ProbePlan {
809 probe,
810 expected_log_growth,
811 budget_from_scratch,
812 budget_remaining: nats_remaining / expected_log_growth,
813 })
814}
815
816#[cfg(test)]
817mod tests {
818 use super::*;
819 use ndarray::array;
820
821 /// e-BH on a hand-checkable configuration.
822 #[test]
823 fn e_bh_rejects_exactly_the_qualifying_prefix() {
824 // m = 4, α = 0.1 → thresholds m/(αk) = 40, 20, 13.33, 10.
825 let log_e = [45.0f64.ln(), 21.0f64.ln(), 12.0f64.ln(), 1.0f64.ln()];
826 let rejected = e_benjamini_hochberg(&log_e, 0.1);
827 // e_(1)=45 ≥ 40 ✓, e_(2)=21 ≥ 20 ✓, e_(3)=12 < 13.33 ✗ → k* = 2.
828 assert_eq!(rejected, vec![0, 1]);
829
830 // A weaker tail cannot drag in a stronger prefix decision.
831 let log_e2 = [45.0f64.ln(), 5.0f64.ln(), 2.0f64.ln(), 1.0f64.ln()];
832 assert_eq!(e_benjamini_hochberg(&log_e2, 0.1), vec![0]);
833 }
834
835 #[test]
836 fn split_likelihood_equal_impossibility_is_neutral_log_evidence() {
837 let log_e = split_likelihood_log_e_value(f64::NEG_INFINITY, f64::NEG_INFINITY);
838 assert_eq!(log_e, 0.0);
839 assert!(log_e.is_finite());
840
841 let mut proc = EProcess::new();
842 proc.absorb_log(log_e).unwrap();
843 assert_eq!(proc.log_evidence(), 0.0);
844 assert_eq!(proc.steps(), 1);
845 }
846
847 #[test]
848 fn e_bh_orders_infinite_log_e_values_without_comparator_panic() {
849 let log_e = [f64::NEG_INFINITY, f64::INFINITY, 45.0f64.ln(), 1.0f64.ln()];
850 assert_eq!(e_benjamini_hochberg(&log_e, 0.1), vec![1, 2]);
851 }
852
853 /// A NaN log e-value (a degenerate claim whose `(−∞) − (−∞)` split-LR
854 /// escaped the source guard) must NOT panic the e-BH comparator — the
855 /// honest FDR surface stays robust. The NaN claim is treated as
856 /// least-evidence (`−∞`): it is never rejected, and it cannot disturb the
857 /// rejection of a genuinely strong claim.
858 #[test]
859 fn e_bh_treats_nan_as_least_evidence_without_panicking() {
860 // m = 2, α = 0.1 → threshold for the top claim is m/(αk) = 2/0.1 = 20.
861 // Claim 0 has e = 45 ≥ 20 (rejected); claim 1 is NaN (no evidence).
862 let log_e = [45.0f64.ln(), f64::NAN];
863 let rejected = e_benjamini_hochberg(&log_e, 0.1);
864 assert_eq!(
865 rejected,
866 vec![0],
867 "strong claim survives; NaN claim never rejected"
868 );
869
870 // An all-NaN ledger yields an empty (no-rejection) certificate, not a
871 // panic.
872 let all_nan = [f64::NAN, f64::NAN, f64::NAN];
873 assert!(e_benjamini_hochberg(&all_nan, 0.1).is_empty());
874 }
875
876 /// The full source→consumer chain: a shard with zero density under both
877 /// the alternative and the null produces `(−∞) − (−∞)`, which the split-LR
878 /// resolves to neutral `log E = 0` rather than NaN; banking it and
879 /// certifying must not panic. A genuinely-NaN log-likelihood is likewise
880 /// resolved to neutral evidence at the source.
881 #[test]
882 fn degenerate_split_lr_flows_through_certify_without_nan_panic() {
883 // (−∞) − (−∞): zero density under both hypotheses → neutral.
884 let neutral = split_likelihood_log_e_value(f64::NEG_INFINITY, f64::NEG_INFINITY);
885 assert_eq!(neutral, 0.0);
886 // NaN log-likelihood (un-evaluable degenerate fit) → neutral, finite.
887 let from_nan = split_likelihood_log_e_value(f64::NAN, -3.0);
888 assert!(from_nan.is_finite());
889 assert_eq!(from_nan, 0.0);
890
891 let mut ledger = StructureLedger::new();
892 let degenerate = ledger.register(ClaimKind::AtomExists { atom: 0 });
893 let strong = ledger.register(ClaimKind::AtomExists { atom: 1 });
894 // Bank the neutral split-LR on the degenerate claim — no NaN reaches
895 // the e-process.
896 ledger.absorb_log(degenerate, neutral).unwrap();
897 ledger.absorb_log(strong, 45.0f64.ln()).unwrap();
898 // certify() runs e_benjamini_hochberg internally; must not panic.
899 let certificate = ledger.certify(0.1);
900 let degenerate_entry = certificate
901 .entries
902 .iter()
903 .find(|e| e.kind == ClaimKind::AtomExists { atom: 0 })
904 .expect("degenerate claim present");
905 // Neutral evidence (log_e = 0) never qualifies → contested, not confirmed.
906 assert!(!degenerate_entry.confirmed);
907 assert_eq!(degenerate_entry.log_e, 0.0);
908 }
909
910 #[test]
911 fn e_process_absorb_log_rejects_undefined_log_products() {
912 let mut proc = EProcess::new();
913 assert!(proc.absorb_log(f64::NAN).is_err());
914
915 proc.absorb_log(f64::INFINITY).unwrap();
916 assert!(proc.absorb_log(f64::NEG_INFINITY).is_err());
917 assert_eq!(proc.log_evidence(), f64::INFINITY);
918 assert_eq!(proc.steps(), 1);
919 }
920
921 /// Ville-style sanity: under H0 (simulated fair e-values from a
922 /// likelihood ratio of identical Gaussians), the e-process crosses
923 /// 1/α rarely; under a true alternative it crosses fast and the
924 /// crossing is PERMANENT (running-sup semantics).
925 #[test]
926 fn e_process_crossing_is_permanent_and_directional() {
927 // Deterministic "stream": per-batch log-LR of N(μ,1) vs N(0,1)
928 // evaluated at x drawn from the alternative: log e = μ x − μ²/2.
929 // Use a fixed quasi-random sequence; no RNG state needed.
930 let mu = 0.6f64;
931 let mut proc_alt = EProcess::new();
932 let mut crossed_at: Option<usize> = None;
933 for t in 0..200 {
934 // x_t ~ alternative-ish deterministic surrogate around μ
935 let x = mu + 0.9 * ((t as f64 * 0.7321).sin());
936 proc_alt.absorb_log(mu * x - 0.5 * mu * mu).unwrap();
937 if proc_alt.rejects_at(0.05) && crossed_at.is_none() {
938 crossed_at = Some(t);
939 }
940 }
941 let t_cross = crossed_at.expect("true alternative must cross 1/α");
942 assert!(t_cross < 100, "evidence should accumulate quickly");
943 // Permanence: rejection holds at the end even if late evidence dips.
944 assert!(proc_alt.rejects_at(0.05));
945
946 // Null stream: x centered at 0 → expected log e = −μ²/2 < 0.
947 let mut proc_null = EProcess::new();
948 for t in 0..200 {
949 let x = 0.9 * ((t as f64 * 0.7321).sin());
950 proc_null.absorb_log(mu * x - 0.5 * mu * mu).unwrap();
951 }
952 assert!(
953 !proc_null.rejects_at(0.05),
954 "null stream must not accumulate evidence (log E = {:.3})",
955 proc_null.log_evidence()
956 );
957 }
958
959 /// The design rule selects discrimination, not raw effect.
960 #[test]
961 fn probe_selection_prefers_discrimination_over_impact() {
962 let fisher = array![[2.0, 0.0], [0.0, 0.5]];
963 let probes = vec![
964 // Huge effect, but both hypotheses predict it identically.
965 CandidateProbe {
966 delta: array![1.0, 0.0],
967 predicted_mean_null: array![10.0, 10.0],
968 predicted_mean_alt: array![10.0, 10.0],
969 },
970 // Modest effect, hypotheses disagree along the informative axis.
971 CandidateProbe {
972 delta: array![0.0, 1.0],
973 predicted_mean_null: array![0.0, 0.0],
974 predicted_mean_alt: array![1.0, 0.2],
975 },
976 ];
977 let (idx, growth) =
978 select_probe_by_expected_evidence(&probes, &fisher).expect("a probe discriminates");
979 assert_eq!(idx, 1);
980 // ½·(1,0.2)ᵀ diag(2,0.5) (1,0.2) = ½·(2 + 0.02) = 1.01 nats/obs.
981 assert!((growth - 1.01).abs() < 1e-12);
982 // Budget: ~3 observations to certify at α=0.05.
983 let budget = expected_resolution_budget(0.05, growth).expect("budget");
984 assert!(budget > 2.0 && budget < 4.0);
985 }
986
987 /// The birth gate certifies under a true alternative, stays contested
988 /// under the null, and never emits anything but those two verdicts.
989 #[test]
990 fn birth_gate_certifies_alternative_and_demotes_never_rejects() {
991 let mut gate = AtomBirthGate::new(0.05).expect("valid alpha");
992 // Strong shards: alternative beats the honest null sup by 1 nat each.
993 for _ in 0..5 {
994 gate.absorb_shard(-100.0, -101.0);
995 }
996 match gate.verdict() {
997 GateVerdict::Certified { log_e } => assert!((log_e - 5.0).abs() < 1e-12),
998 v => panic!("5 nats must certify at α=0.05, got {v:?}"),
999 }
1000 // Permanence: a later evidence retreat cannot un-certify.
1001 gate.absorb_shard(-110.0, -100.0);
1002 assert!(matches!(gate.verdict(), GateVerdict::Certified { .. }));
1003
1004 // Null-ish stream: the prefit alternative loses to the on-shard sup
1005 // (it must, on average — the sup is fit on the eval shard itself).
1006 let mut null_gate = AtomBirthGate::new(0.05).expect("valid alpha");
1007 for _ in 0..50 {
1008 null_gate.absorb_shard(-100.3, -100.0);
1009 }
1010 match null_gate.verdict() {
1011 GateVerdict::Contested { log_e } => assert!(log_e < 0.0),
1012 v => panic!("null stream must stay contested, got {v:?}"),
1013 }
1014 assert!(AtomBirthGate::new(0.0).is_err());
1015 assert!(AtomBirthGate::new(1.0).is_err());
1016 }
1017
1018 /// Ledger: idempotent registration preserves evidence; the certificate
1019 /// splits confirmed/contested by e-BH and the entry list reproduces it.
1020 #[test]
1021 fn ledger_certificate_splits_confirmed_and_contested() {
1022 let mut ledger = StructureLedger::new();
1023 let a0 = ledger.register(ClaimKind::AtomExists { atom: 0 });
1024 let a1 = ledger.register(ClaimKind::AtomExists { atom: 1 });
1025 let edge = ledger.register(ClaimKind::BindingEdge { a: 0, b: 1 });
1026
1027 // m = 3, α = 0.1 → e-BH thresholds m/(αk) = 30, 15, 10.
1028 ledger.absorb_log(a0, 40.0f64.ln()).unwrap();
1029 ledger.absorb_log(a1, 20.0f64.ln()).unwrap();
1030 ledger.absorb_log(edge, 2.0f64.ln()).unwrap();
1031
1032 // Re-registering must return the same slot with evidence intact.
1033 let a0_again = ledger.register(ClaimKind::AtomExists { atom: 0 });
1034 assert_eq!(a0_again, a0);
1035 assert_eq!(ledger.claims()[a0].evidence.steps(), 1);
1036
1037 let cert = ledger.certify(0.1);
1038 // e_(1)=40 ≥ 30 ✓, e_(2)=20 ≥ 15 ✓, e_(3)=2 < 10 ✗ → atoms confirmed,
1039 // the binding edge stays contested.
1040 let confirmed: Vec<&ClaimKind> = cert.confirmed().map(|e| &e.kind).collect();
1041 assert_eq!(confirmed.len(), 2);
1042 assert!(confirmed.contains(&&ClaimKind::AtomExists { atom: 0 }));
1043 assert!(confirmed.contains(&&ClaimKind::AtomExists { atom: 1 }));
1044 let contested: Vec<&CertificateEntry> = cert.contested().collect();
1045 assert_eq!(contested.len(), 1);
1046 assert_eq!(contested[0].kind, ClaimKind::BindingEdge { a: 0, b: 1 });
1047
1048 assert!(ledger.absorb_log(99, 0.0).is_err());
1049 }
1050
1051 /// Resumability: a serialized ledger reloads with its evidence and
1052 /// keeps absorbing — the #973 shard contract.
1053 #[test]
1054 fn ledger_evidence_resumes_across_serialization() {
1055 let mut ledger = StructureLedger::new();
1056 let idx = ledger.register(ClaimKind::GeometryKind {
1057 atom: 3,
1058 kind: "circle".to_string(),
1059 });
1060 ledger.absorb_log(idx, 1.25).unwrap();
1061
1062 let persisted = serde_json::to_string(&ledger).expect("serialize ledger");
1063 let mut resumed: StructureLedger =
1064 serde_json::from_str(&persisted).expect("deserialize ledger");
1065 assert_eq!(resumed.claims()[idx].evidence.steps(), 1);
1066
1067 resumed.absorb_log(idx, 0.75).unwrap();
1068 let log_e = resumed.claims()[idx].evidence.log_evidence();
1069 assert!((log_e - 2.0).abs() < 1e-12);
1070 }
1071
1072 /// The probe plan discounts the remaining budget by evidence already
1073 /// banked, and floors at zero once the claim is across the line.
1074 #[test]
1075 fn probe_plan_discounts_remaining_budget_by_current_evidence() {
1076 let fisher = array![[2.0, 0.0], [0.0, 0.5]];
1077 let probes = vec![CandidateProbe {
1078 delta: array![0.0, 1.0],
1079 predicted_mean_null: array![0.0, 0.0],
1080 predicted_mean_alt: array![1.0, 0.2],
1081 }];
1082 // growth = 1.01 nats/obs (checked above); α=0.05 → need ln(20) ≈ 3.0 nats.
1083 let from_zero = plan_probe_for_contested_claim(&probes, &fisher, 0.05, 0.0).expect("plan");
1084 assert_eq!(from_zero.probe, 0);
1085 assert!((from_zero.budget_remaining - from_zero.budget_from_scratch).abs() < 1e-12);
1086
1087 let halfway = plan_probe_for_contested_claim(&probes, &fisher, 0.05, 1.5).expect("plan");
1088 assert!(halfway.budget_remaining < from_zero.budget_remaining);
1089 assert!((halfway.budget_remaining - (-(0.05f64.ln()) - 1.5) / 1.01).abs() < 1e-12);
1090
1091 let across = plan_probe_for_contested_claim(&probes, &fisher, 0.05, 10.0).expect("plan");
1092 assert_eq!(across.budget_remaining, 0.0);
1093
1094 // No discriminating probe → no plan (undecidable by steering).
1095 let blind = vec![CandidateProbe {
1096 delta: array![1.0, 0.0],
1097 predicted_mean_null: array![5.0, 5.0],
1098 predicted_mean_alt: array![5.0, 5.0],
1099 }];
1100 assert!(plan_probe_for_contested_claim(&blind, &fisher, 0.05, 0.0).is_none());
1101 }
1102
1103 /// The p→e calibrator on hand-checkable values, including its edges.
1104 #[test]
1105 fn p_to_e_calibrator_hand_values() {
1106 // e(p) = ½ p^{−1/2}: p = 1 → e = 0.5; p = 0.04 → e = 2.5; p = 1e-4 → e = 50.
1107 assert!((log_e_from_p_calibrator(1.0).unwrap() - 0.5f64.ln()).abs() < 1e-12);
1108 assert!((log_e_from_p_calibrator(0.04).unwrap() - 2.5f64.ln()).abs() < 1e-12);
1109 assert!((log_e_from_p_calibrator(1e-4).unwrap() - 50.0f64.ln()).abs() < 1e-12);
1110 assert!(log_e_from_p_calibrator(0.0).is_err());
1111 assert!(log_e_from_p_calibrator(1.5).is_err());
1112 assert!(log_e_from_p_calibrator(f64::NAN).is_err());
1113 }
1114
1115 /// The e-value validity condition: under the null `P ~ Uniform(0, 1]`,
1116 /// the calibrated e-value must satisfy `E_{H0}[e(P)] = ∫₀¹ e(p) dp ≤ 1`
1117 /// (Wang–Ramdas e-BH controls FDR ONLY for genuine e-values). The κ = ½
1118 /// member `e(p) = ½ p^{−1/2}` integrates to exactly 1, the boundary of
1119 /// admissibility. We verify this numerically with a midpoint Riemann sum
1120 /// over the unit interval (which UNDER-estimates `∫ p^{−1/2}` slightly
1121 /// because the integrand is convex, so the analytic value 1 sits just
1122 /// above the quadrature estimate — both safely ≤ 1 + tolerance).
1123 ///
1124 /// This is the property `e = 1/p` VIOLATES — `∫₀¹ (1/p) dp = ∞` — which
1125 /// is why the behavioral-head and anova-atom paths route through this
1126 /// calibrator rather than `−ln p`.
1127 #[test]
1128 fn p_to_e_calibrator_null_expectation_at_most_one() {
1129 let n = 2_000_000usize;
1130 let h = 1.0 / n as f64;
1131 let mut mean_e = 0.0_f64;
1132 for i in 0..n {
1133 // Midpoint of cell i: p = (i + 0.5)/n, always in (0, 1).
1134 let p = (i as f64 + 0.5) * h;
1135 let e = log_e_from_p_calibrator(p).unwrap().exp();
1136 mean_e += e * h;
1137 }
1138 // Analytic E_{H0}[e(P)] = 1; allow a small quadrature tolerance, but
1139 // it MUST NOT exceed 1 by more than that (an invalid calibrator like
1140 // 1/p would diverge here, not land near 1).
1141 assert!(
1142 mean_e <= 1.0 + 1e-3,
1143 "calibrated e-value null expectation {mean_e} exceeds 1 — not a valid e-value"
1144 );
1145 assert!(
1146 mean_e > 0.99,
1147 "calibrated e-value null expectation {mean_e} far below the analytic 1.0"
1148 );
1149 }
1150
1151 /// POWER STUDY, null side: the heuristic gate every dictionary paper
1152 /// runs — "accept the K+1-th atom the first time the cumulative
1153 /// likelihood ratio shows improvement" — versus the e-gate, on a
1154 /// family of NULL streams peeked at after every shard. The per-shard
1155 /// log-LR is `μ x_t − μ²/2` with `x_t = A sin(ω t + φ)` (a
1156 /// deterministic null surrogate: mean drift −μ²/2 < 0, bounded
1157 /// fluctuation). The naive gate's false-accept mechanism is exactly
1158 /// optional stopping: any phase whose partial sums wander above zero
1159 /// at ANY peek accepts a nonexistent atom. The e-gate needs
1160 /// log(1/α) ≈ 3.0 nats, and the partial-sum fluctuation is bounded by
1161 /// `μ·A/sin(ω/2) ≈ 1.51` nats (Dirichlet-kernel bound) BEFORE the
1162 /// negative drift — so it can never certify on any phase, which is
1163 /// Ville's inequality made concrete.
1164 #[test]
1165 fn power_study_null_naive_peeking_gate_false_accepts_e_gate_never() {
1166 let mu = 0.6f64;
1167 let amp = 0.9f64;
1168 let omega = 0.7321f64;
1169 let n_phases = 60usize;
1170 let n_shards = 200usize;
1171
1172 let mut naive_false_accepts = 0usize;
1173 let mut e_gate_false_accepts = 0usize;
1174 for k in 0..n_phases {
1175 let phase = 2.0 * std::f64::consts::PI * (k as f64) / (n_phases as f64);
1176 let mut gate = AtomBirthGate::new(0.05).expect("alpha");
1177 let mut cum_log_lr = 0.0f64;
1178 let mut naive_accepted = false;
1179 for t in 0..n_shards {
1180 let x = amp * ((t as f64) * omega + phase).sin();
1181 let log_lr = mu * x - 0.5 * mu * mu;
1182 cum_log_lr += log_lr;
1183 // The broken test: peek, accept on any improvement.
1184 if cum_log_lr > 0.0 {
1185 naive_accepted = true;
1186 }
1187 gate.absorb_shard(log_lr, 0.0);
1188 }
1189 if naive_accepted {
1190 naive_false_accepts += 1;
1191 }
1192 if matches!(gate.verdict(), GateVerdict::Certified { .. }) {
1193 e_gate_false_accepts += 1;
1194 }
1195 }
1196 // The naive gate false-accepts on a large fraction of null phases
1197 // (any phase with early-positive partial sums); the e-gate on none.
1198 assert!(
1199 naive_false_accepts >= n_phases / 3,
1200 "the peeking gate should false-accept often under the null \
1201 (got {naive_false_accepts}/{n_phases})"
1202 );
1203 assert_eq!(
1204 e_gate_false_accepts, 0,
1205 "the e-gate must never certify under the null"
1206 );
1207 }
1208
1209 /// POWER STUDY, alternative side, through the orchestration harness:
1210 /// a planted K+1-th atom worth 0.5 nats/shard certifies in
1211 /// ⌈log(1/α)/0.5⌉ = 6 shards — matching the design-time
1212 /// `expected_resolution_budget` — after which the gate stops absorbing
1213 /// (the crossing is permanent) while the alternative keeps refitting
1214 /// on the remaining shards.
1215 #[test]
1216 fn power_study_planted_atom_certifies_at_the_predicted_budget() {
1217 let growth = 0.5f64;
1218 let (gate, alt_state) = run_atom_birth_gate(
1219 0.05,
1220 0usize, // alt state = number of shards folded into the fit
1221 0..20usize,
1222 |_, _| -99.5, // prefit alternative log-lik on the shard
1223 |_| -100.0, // honest null sup on the shard
1224 |folded, _| folded + 1,
1225 )
1226 .expect("valid alpha");
1227
1228 match gate.verdict() {
1229 GateVerdict::Certified { log_e } => assert!((log_e - 3.0).abs() < 1e-12),
1230 v => panic!("planted atom must certify, got {v:?}"),
1231 }
1232 // Realized time-to-certification == the design-time budget, rounded up.
1233 let budget = expected_resolution_budget(0.05, growth).expect("budget");
1234 assert_eq!(gate.test.process.steps(), budget.ceil() as usize);
1235 assert_eq!(gate.test.process.steps(), 6);
1236 // The alternative state saw the whole stream despite early stopping.
1237 assert_eq!(alt_state, 20);
1238 }
1239
1240 /// Work-plan step 4, closed end-to-end: a contested claim gets a probe
1241 /// plan, the probe's realized outcomes are scored under both FROZEN
1242 /// hypotheses via [`StructureLedger::absorb_probe_outcome`], and the
1243 /// banked evidence flips the claim to confirmed within a small multiple
1244 /// of the plan's predicted resolution budget. Outcome noise is a
1245 /// deterministic bounded surrogate (zero-mean sinusoid), so under the
1246 /// true alternative each probe's expected log-growth is exactly the
1247 /// design value.
1248 #[test]
1249 fn design_loop_resolves_contested_claim_within_predicted_budget() {
1250 let mut ledger = StructureLedger::new();
1251 let idx = ledger.register(ClaimKind::GeometryKind {
1252 atom: 0,
1253 kind: "circle".to_string(),
1254 });
1255
1256 // Local Gaussian output model, unit-isotropic noise in the
1257 // Fisher-whitened coordinates: per-observation expected log-growth
1258 // under H1 is exactly the planned ½‖μ₁−μ₀‖²_F.
1259 let fisher = array![[1.0, 0.0], [0.0, 1.0]];
1260 let mu0 = array![0.0, 0.0];
1261 let mu1 = array![1.2, 0.5];
1262 let probes = vec![CandidateProbe {
1263 delta: array![0.0, 1.0],
1264 predicted_mean_null: mu0.clone(),
1265 predicted_mean_alt: mu1.clone(),
1266 }];
1267 let alpha = 0.05;
1268 let plan = plan_probe_for_contested_claim(&probes, &fisher, alpha, 0.0).expect("plan");
1269 assert_eq!(plan.probe, 0);
1270 // ½‖μ₁−μ₀‖² = ½(1.44 + 0.25) = 0.845 nats/obs; ln 20 ≈ 3.0 ⇒ ~3.6 obs.
1271 assert!((plan.expected_log_growth - 0.845).abs() < 1e-12);
1272 let budget = plan.budget_remaining.ceil().max(1.0) as usize;
1273
1274 // Run the probe loop: outcomes realized under the TRUE alternative
1275 // (mean μ₁ plus bounded zero-mean fluctuation); both hypotheses'
1276 // densities were frozen above, before any outcome existed.
1277 let mut observations = 0usize;
1278 while !ledger.claims()[idx].evidence.rejects_at(alpha) {
1279 observations += 1;
1280 assert!(
1281 observations <= 4 * budget,
1282 "claim must resolve within a small multiple of the predicted \
1283 budget {budget}; still contested after {observations} probes"
1284 );
1285 let t = observations as f64;
1286 let eps0 = 0.8 * (t * 0.7321).sin();
1287 let eps1 = 0.8 * (t * 1.1173).cos();
1288 let y = array![mu1[0] + eps0, mu1[1] + eps1];
1289 // Unit-Gaussian log-densities under each frozen hypothesis; the
1290 // shared normalizer cancels in the ratio.
1291 let d1 = &y - &mu1;
1292 let d0 = &y - &mu0;
1293 ledger
1294 .absorb_probe_outcome(idx, -0.5 * d1.dot(&d1), -0.5 * d0.dot(&d0))
1295 .expect("absorb");
1296 }
1297 let cert = ledger.certify(alpha);
1298 assert!(
1299 cert.confirmed()
1300 .any(|e| matches!(e.kind, ClaimKind::GeometryKind { atom: 0, .. }))
1301 );
1302 }
1303}