Skip to main content

gam_terms/decoders/
behavioral_head.rs

1//! Behaviorally-anchored SAE head (issue #912).
2//!
3//! # What this is
4//!
5//! An unsupervised SAE dictionary — manifold or linear — pins a concept's
6//! *shape* and its coordinate up to isometry. Both are functionals of `p(x)`.
7//! It cannot pin the map from coordinate to behavior, `p(y|x)`, which is
8//! formally independent of `p(x)`: two models with identical activation
9//! geometry can read out differently. The behavioral coefficient an atom
10//! "gets" from a post-hoc probe is therefore not a refinement of the
11//! dictionary — it is the *only* labeled source of the one thing the
12//! dictionary structurally lacks.
13//!
14//! The fix is to make the behavior part of the model. Instead of treating the
15//! auxiliary `u` as a fixed covariate in a conditional Gaussian *prior* on the
16//! latent codes (`LatentIdMode::AuxPrior`, the iVAE gauge), this module
17//! promotes the auxiliary signal to a *modeled outcome*: a GLM behavioral head
18//!
19//! ```text
20//!   g(E[y_n | t_n]) = a + t_n · w
21//! ```
22//!
23//! whose design columns are the latent codes `t_n` themselves. The head's
24//! coefficients `(a, w)` live in the β tier and the head log-likelihood enters
25//! the *same* Laplace/REML objective as the reconstruction channel, so REML
26//! balances reconstruction vs. behavioral fit automatically — no hand-tuned
27//! trade-off scalar (magic by default). Because the design depends on `ψ` (the
28//! latent codes move during the joint fit), the head couples to the latent
29//! block exactly the way the arrow-Schur border already hosts a β-border
30//! coupled to per-row latent blocks.
31//!
32//! # The three pieces
33//!
34//! 1. [`BehavioralHead`] — the head GLM itself: value + gradient of the head
35//!    log-likelihood w.r.t. the head coefficients `(a, w)` AND w.r.t. the
36//!    latent codes `t` (the cross-channel coupling), under a [`RowSubsampleMask`]
37//!    weighting so unlabeled rows carry zero head weight (semi-supervised).
38//!
39//! 2. [`LeakageAbsorber`] — the #461 Neyman-orthogonal device. Joint fitting
40//!    can sculpt the dictionary to *encode the label* (rediscover your own
41//!    probe). The absorber widens the reconstruction design with the head's
42//!    score-influence directions so the dictionary update is orthogonalized
43//!    against the label channel. The boundary it enforces is precisely
44//!    "orient what `p(x)` put there" vs. "hallucinate geometry from the label"
45//!    — the novel statistical content of the whole construction.
46//!
47//! 3. [`head_feature_significance`] — per-feature (per-atom) significance of
48//!    the behavioral loading via [`wood_smooth_test`], converted to an
49//!    FDR-controlled report through the e-BH multiplicity machinery
50//!    ([`e_benjamini_hochberg`]). Features whose behavioral signal survives
51//!    orthogonalization AND the multiplicity correction are the reportable
52//!    ones.
53
54use crate::inference::smooth_test::{SmoothTestInput, SmoothTestScale, wood_smooth_test};
55use crate::inference::structure_evidence::{e_benjamini_hochberg, log_e_from_p_calibrator};
56use gam_problem::RowSubsampleMask;
57use gam_linalg::faer_ndarray::FaerSvd;
58use ndarray::{Array1, Array2, ArrayView1, ArrayView2};
59
60/// Outcome family for the behavioral head.
61///
62/// `Binomial` is the canonical safety-probe family (a binary label —
63/// deception, harmfulness — read out from the latent codes). `Multinomial`
64/// covers a categorical behavioral label with `n_classes` levels via a
65/// shared-design softmax head; class 0 is the reference. Both are the same
66/// families already in `src/families/`; this enum only selects the head's
67/// link + log-likelihood, it does not re-implement the families.
68#[derive(Debug, Clone, Copy, PartialEq, Eq)]
69pub enum AuxOutcomeFamily {
70    /// Logistic head: `logit P(y=1 | t) = a + t·w`. A single binary label
71    /// pins roughly one gauge dimension, which is why `AuxOutcome` composes
72    /// with `DimSelection` ARD + the isometry pin rather than replacing them.
73    Binomial,
74    /// Softmax head over `n_classes` levels (class 0 reference). The behavioral
75    /// subspace it can orient has dimension at most `n_classes − 1`, still
76    /// low-dimensional — the geometry does the shape work.
77    Multinomial { n_classes: usize },
78}
79
80impl AuxOutcomeFamily {
81    /// Number of linear-predictor channels the head produces per row.
82    /// Binomial = 1; Multinomial with `K` classes = `K − 1` (reference-coded).
83    pub fn n_eta_channels(&self) -> usize {
84        match self {
85            AuxOutcomeFamily::Binomial => 1,
86            AuxOutcomeFamily::Multinomial { n_classes } => n_classes.saturating_sub(1),
87        }
88    }
89}
90
91/// The behavioral head GLM.
92///
93/// Stores the labels `y` (length `n`), the per-row head weights `w_row`
94/// (length `n`; 0 ⇒ unlabeled, the semi-supervised seam), and the family.
95/// The design is *not* stored: it is the live latent-code matrix `t`
96/// (`n × d`) passed at evaluation time, because `t` moves during the joint
97/// fit. The head coefficients are `(intercept, loadings)` with one
98/// `(1 + d)`-vector per η-channel.
99#[derive(Debug, Clone)]
100pub struct BehavioralHead {
101    family: AuxOutcomeFamily,
102    /// For `Binomial`: the 0/1 label per row. For `Multinomial`: the class
103    /// index in `0..n_classes` per row (stored as `f64`, integral-valued).
104    y: Array1<f64>,
105    /// Per-row head-channel weight. `0.0` on unlabeled rows (semi-supervised);
106    /// `1.0` on labeled rows by default. Derived from a [`RowSubsampleMask`] via
107    /// [`BehavioralHead::with_row_measure`].
108    w_row: Array1<f64>,
109}
110
111impl BehavioralHead {
112    /// Build a head from labels and an explicit per-row weight vector.
113    ///
114    /// `family.n_eta_channels()` channels each get a `(1 + d)` coefficient
115    /// block at fit time. Validates label range against the family.
116    pub fn new(
117        family: AuxOutcomeFamily,
118        y: Array1<f64>,
119        w_row: Array1<f64>,
120    ) -> Result<Self, String> {
121        let n = y.len();
122        if w_row.len() != n {
123            return Err(format!(
124                "BehavioralHead: w_row length {} != labels length {n}",
125                w_row.len()
126            ));
127        }
128        for &v in w_row.iter() {
129            if !(v.is_finite() && v >= 0.0) {
130                return Err(format!(
131                    "BehavioralHead: row weights must be finite and ≥ 0, got {v}"
132                ));
133            }
134        }
135        match family {
136            AuxOutcomeFamily::Binomial => {
137                for (i, &label) in y.iter().enumerate() {
138                    if label != 0.0 && label != 1.0 {
139                        return Err(format!(
140                            "BehavioralHead(Binomial): label[{i}] = {label} is not 0/1"
141                        ));
142                    }
143                }
144            }
145            AuxOutcomeFamily::Multinomial { n_classes } => {
146                if n_classes < 2 {
147                    return Err(format!(
148                        "BehavioralHead(Multinomial): need ≥ 2 classes, got {n_classes}"
149                    ));
150                }
151                for (i, &label) in y.iter().enumerate() {
152                    let k = label as usize;
153                    if k as f64 != label || k >= n_classes {
154                        return Err(format!(
155                            "BehavioralHead(Multinomial): label[{i}] = {label} not an \
156                             integer class index in 0..{n_classes}"
157                        ));
158                    }
159                }
160            }
161        }
162        Ok(Self { family, y, w_row })
163    }
164
165    /// Build a head where every row carries unit head weight (fully supervised).
166    pub fn fully_supervised(family: AuxOutcomeFamily, y: Array1<f64>) -> Result<Self, String> {
167        let n = y.len();
168        Self::new(family, y, Array1::from_elem(n, 1.0))
169    }
170
171    /// Build a head whose per-row weights come from a [`RowSubsampleMask`]: rows
172    /// outside the measure (unlabeled) get weight 0, rows inside get the
173    /// measure's weight. This is the semi-supervised seam — no new mechanism,
174    /// the same row-weighting the inner solver already enforces for ρ-coherence.
175    pub fn with_row_measure(
176        family: AuxOutcomeFamily,
177        y: Array1<f64>,
178        measure: &RowSubsampleMask,
179    ) -> Result<Self, String> {
180        let n = y.len();
181        let (indices, weights) = measure.indices_and_weights(n);
182        let mut w_row = Array1::<f64>::zeros(n);
183        for &idx in &indices {
184            if idx < n {
185                w_row[idx] = weights[idx];
186            }
187        }
188        Self::new(family, y, w_row)
189    }
190
191    pub fn family(&self) -> AuxOutcomeFamily {
192        self.family
193    }
194
195    pub fn n_obs(&self) -> usize {
196        self.y.len()
197    }
198
199    /// Number of head coefficients given latent dimension `d`: one
200    /// `(1 + d)` block (intercept + per-axis loading) per η-channel.
201    pub fn n_coeffs(&self, latent_dim: usize) -> usize {
202        self.family.n_eta_channels() * (1 + latent_dim)
203    }
204
205    /// Total effective head-channel sample size `Σ_n w_row[n]` — the number of
206    /// labeled rows (weighted). Zero ⇒ a vacuous head (every row unlabeled),
207    /// which the validator rejects: a head with no labels cannot anchor a gauge.
208    pub fn effective_labeled_count(&self) -> f64 {
209        self.w_row.iter().sum()
210    }
211
212    /// Per-row, per-channel linear predictor `η[n, c] = a_c + t_n · w_c`.
213    fn eta(&self, t: ArrayView2<'_, f64>, coeffs: ArrayView1<'_, f64>) -> Array2<f64> {
214        let (n, d) = t.dim();
215        let n_eta = self.family.n_eta_channels();
216        let mut eta = Array2::<f64>::zeros((n, n_eta));
217        for c in 0..n_eta {
218            let base = c * (1 + d);
219            let a = coeffs[base];
220            for row in 0..n {
221                let mut acc = a;
222                for axis in 0..d {
223                    acc += t[[row, axis]] * coeffs[base + 1 + axis];
224                }
225                eta[[row, c]] = acc;
226            }
227        }
228        eta
229    }
230
231    /// Negative head log-likelihood and its gradient w.r.t. **both** the head
232    /// coefficients and the latent codes `t`.
233    ///
234    /// Returns `(nll, grad_coeffs, grad_t)` where:
235    /// * `nll = −Σ_n w_row[n] · log p(y_n | η_n)` (weighted),
236    /// * `grad_coeffs` has length `n_coeffs(d)` — the head-coefficient gradient
237    ///   that drives the β-tier update,
238    /// * `grad_t` is `(n, d)` — the cross-channel coupling that flows into the
239    ///   latent-block gradient (the arrow-Schur border coupling).
240    ///
241    /// Convention matches the rest of the latent objective: this is the
242    /// *negative* log-likelihood, so it adds to the joint cost and its gradient
243    /// adds to the joint gradient.
244    pub fn neg_loglik_and_grad(
245        &self,
246        t: ArrayView2<'_, f64>,
247        coeffs: ArrayView1<'_, f64>,
248    ) -> Result<(f64, Array1<f64>, Array2<f64>), String> {
249        let (n, d) = t.dim();
250        if n != self.y.len() {
251            return Err(format!(
252                "BehavioralHead: latent rows {n} != labels {}",
253                self.y.len()
254            ));
255        }
256        let n_eta = self.family.n_eta_channels();
257        if coeffs.len() != n_eta * (1 + d) {
258            return Err(format!(
259                "BehavioralHead: coeffs length {} != n_eta·(1+d) = {}",
260                coeffs.len(),
261                n_eta * (1 + d)
262            ));
263        }
264        let eta = self.eta(t, coeffs);
265        let mut nll = 0.0_f64;
266        let mut grad_coeffs = Array1::<f64>::zeros(n_eta * (1 + d));
267        let mut grad_t = Array2::<f64>::zeros((n, d));
268
269        match self.family {
270            AuxOutcomeFamily::Binomial => {
271                for row in 0..n {
272                    let w = self.w_row[row];
273                    if w == 0.0 {
274                        continue;
275                    }
276                    let e = eta[[row, 0]];
277                    // Numerically-stable logistic NLL: log(1+exp(η)) − y·η.
278                    let log1p = if e > 0.0 {
279                        e + (-e).exp().ln_1p()
280                    } else {
281                        e.exp().ln_1p()
282                    };
283                    let y = self.y[row];
284                    nll += w * (log1p - y * e);
285                    // dNLL/dη = p − y, p = σ(η).
286                    let p = 1.0 / (1.0 + (-e).exp());
287                    let r = w * (p - y);
288                    grad_coeffs[0] += r;
289                    for axis in 0..d {
290                        grad_coeffs[1 + axis] += r * t[[row, axis]];
291                        grad_t[[row, axis]] += r * coeffs[1 + axis];
292                    }
293                }
294            }
295            AuxOutcomeFamily::Multinomial { .. } => {
296                for row in 0..n {
297                    let w = self.w_row[row];
298                    if w == 0.0 {
299                        continue;
300                    }
301                    // Softmax over the K−1 free channels plus the implicit
302                    // reference channel (η_0 ≡ 0). LSE includes the 0 term.
303                    let mut max_eta = 0.0_f64;
304                    for c in 0..n_eta {
305                        if eta[[row, c]] > max_eta {
306                            max_eta = eta[[row, c]];
307                        }
308                    }
309                    let mut denom = (0.0 - max_eta).exp();
310                    for c in 0..n_eta {
311                        denom += (eta[[row, c]] - max_eta).exp();
312                    }
313                    let lse = max_eta + denom.ln();
314                    let label = self.y[row] as usize;
315                    // log p(y) = η_y − lse, with η_0 = 0 for the reference class.
316                    let eta_y = if label == 0 {
317                        0.0
318                    } else {
319                        eta[[row, label - 1]]
320                    };
321                    nll += w * (lse - eta_y);
322                    // dNLL/dη_c = p_c − 1{y = c+1}, for free channel c (class c+1).
323                    for c in 0..n_eta {
324                        let p_c = (eta[[row, c]] - lse).exp();
325                        let indicator = if label == c + 1 { 1.0 } else { 0.0 };
326                        let r = w * (p_c - indicator);
327                        let base = c * (1 + d);
328                        grad_coeffs[base] += r;
329                        for axis in 0..d {
330                            grad_coeffs[base + 1 + axis] += r * t[[row, axis]];
331                            grad_t[[row, axis]] += r * coeffs[base + 1 + axis];
332                        }
333                    }
334                }
335            }
336        }
337        Ok((nll, grad_coeffs, grad_t))
338    }
339
340    /// Per-row, per-channel head working weights `s[n, c]` — the diagonal of the
341    /// Fisher information in η-space (`p(1−p)` logistic, `p_c(1−p_c)` softmax
342    /// diagonal). These weight the score-influence directions the leakage
343    /// absorber appends to the reconstruction design (the realized label-channel
344    /// leverage per row), times the row's head weight so unlabeled rows
345    /// contribute nothing to the absorber, exactly as they contribute nothing
346    /// to the head likelihood.
347    pub fn head_working_weights(
348        &self,
349        t: ArrayView2<'_, f64>,
350        coeffs: ArrayView1<'_, f64>,
351    ) -> Result<Array2<f64>, String> {
352        let (n, d) = t.dim();
353        let n_eta = self.family.n_eta_channels();
354        if coeffs.len() != n_eta * (1 + d) {
355            return Err("BehavioralHead::head_working_weights: coeff length mismatch".to_string());
356        }
357        let eta = self.eta(t, coeffs);
358        let mut s = Array2::<f64>::zeros((n, n_eta));
359        match self.family {
360            AuxOutcomeFamily::Binomial => {
361                for row in 0..n {
362                    let p = 1.0 / (1.0 + (-eta[[row, 0]]).exp());
363                    s[[row, 0]] = self.w_row[row] * p * (1.0 - p);
364                }
365            }
366            AuxOutcomeFamily::Multinomial { .. } => {
367                for row in 0..n {
368                    let mut max_eta = 0.0_f64;
369                    for c in 0..n_eta {
370                        if eta[[row, c]] > max_eta {
371                            max_eta = eta[[row, c]];
372                        }
373                    }
374                    let mut denom = (0.0 - max_eta).exp();
375                    for c in 0..n_eta {
376                        denom += (eta[[row, c]] - max_eta).exp();
377                    }
378                    let lse = max_eta + denom.ln();
379                    for c in 0..n_eta {
380                        let p_c = (eta[[row, c]] - lse).exp();
381                        s[[row, c]] = self.w_row[row] * p_c * (1.0 - p_c);
382                    }
383                }
384            }
385        }
386        Ok(s)
387    }
388}
389
390/// The #461 Neyman-orthogonal leakage absorber for the behavioral head.
391///
392/// # The boundary it enforces
393///
394/// You want the behavioral channel to *orient* the existing manifold (fix the
395/// frame) but *not* to *invent* geometry absent from `p(x)` (sculpt a manifold
396/// to fit the label). The orthogonalization is precisely the boundary between
397/// those two — between "orient what's there" and "hallucinate structure from
398/// the label." Getting it exactly right is the single most important statistical
399/// content of the whole construction.
400///
401/// # Mechanism (mirrors the survival/BMS install)
402///
403/// The head's score-influence directions in latent-code space are the rows of
404/// the *score-influence Jacobian*
405///
406/// ```text
407///   Z[n, :] = √s_n · ∂η_n/∂t_n = √s_n · w   (per η-channel)
408/// ```
409///
410/// i.e. the realized, per-row, Fisher-weighted directions along which a change
411/// in the latent codes moves the label-channel linear predictor. We
412/// orthonormalize their span (thin QR) to obtain the *label-channel subspace*
413/// `Q` in latent-code space. The dictionary (reconstruction) update is then
414/// projected onto the orthogonal complement of `Q`:
415///
416/// ```text
417///   Δt_recon  ←  (I − Q Qᵀ) Δt_recon
418/// ```
419///
420/// so the reconstruction channel can only move the codes in directions the
421/// label channel does *not* already explain. Equivalently, the reconstruction
422/// design is widened with `Q` as a null-penalized absorbed block, making the
423/// dictionary's estimating equation orthogonal to `span(Q)` — the label channel
424/// orients the frame, but cannot drag the dictionary toward encoding the label.
425#[derive(Debug, Clone)]
426pub struct LeakageAbsorber {
427    /// Orthonormal basis `Q ∈ ℝ^{d × r}` of the label-channel subspace in
428    /// latent-code space (`r ≤ min(d, n_eta)`). The reconstruction update is
429    /// projected onto `range(Q)^⊥`.
430    q: Array2<f64>,
431}
432
433impl LeakageAbsorber {
434    /// Build the absorber from the head's score-influence Jacobian.
435    ///
436    /// `score_influence` is `(n × (d · n_eta))`: for each row, the stacked
437    /// per-channel Fisher-weighted direction `√s_{n,c} · w_c` flattened over
438    /// channels. We take the *column* span (the directions in latent-code space
439    /// the label channel is sensitive to), reduced to an orthonormal basis via
440    /// thin SVD with a relative tolerance, so a rank-deficient or vacuous label
441    /// channel yields a low-rank (possibly empty) `Q` and the absorber is a
442    /// no-op — the honest behavior when the label pins little gauge.
443    pub fn from_score_influence(
444        score_influence: ArrayView2<'_, f64>,
445        latent_dim: usize,
446    ) -> Result<Self, String> {
447        let (n, cols) = score_influence.dim();
448        if latent_dim == 0 {
449            return Ok(Self {
450                q: Array2::<f64>::zeros((0, 0)),
451            });
452        }
453        if cols % latent_dim != 0 {
454            return Err(format!(
455                "LeakageAbsorber: score_influence has {cols} columns, not a multiple of \
456                 latent_dim {latent_dim}"
457            ));
458        }
459        let n_eta = cols / latent_dim;
460        // Accumulate the (d × d) latent-space Gram of the label-channel
461        // directions: G = Σ_n Σ_c v_{n,c} v_{n,c}ᵀ where v_{n,c} ∈ ℝ^d is the
462        // c-th channel's score-influence direction for row n.
463        let mut gram = Array2::<f64>::zeros((latent_dim, latent_dim));
464        for row in 0..n {
465            for c in 0..n_eta {
466                let base = c * latent_dim;
467                for i in 0..latent_dim {
468                    let vi = score_influence[[row, base + i]];
469                    for j in 0..latent_dim {
470                        gram[[i, j]] += vi * score_influence[[row, base + j]];
471                    }
472                }
473            }
474        }
475        // Eigenvectors of G via SVD (G symmetric PSD) give the principal
476        // label-channel directions; keep those above a relative tolerance.
477        let (u_opt, sv, _vt) = gram
478            .svd(true, false)
479            .map_err(|e| format!("LeakageAbsorber: SVD of label-channel Gram failed: {e}"))?;
480        let u = u_opt.ok_or_else(|| "LeakageAbsorber: SVD did not return U".to_string())?;
481        let max_sv = sv.iter().cloned().fold(0.0_f64, f64::max);
482        let tol = max_sv * (latent_dim as f64) * f64::EPSILON;
483        let rank = sv.iter().filter(|&&s| s > tol).count();
484        let mut q = Array2::<f64>::zeros((latent_dim, rank));
485        for col in 0..rank {
486            for r in 0..latent_dim {
487                q[[r, col]] = u[[r, col]];
488            }
489        }
490        Ok(Self { q })
491    }
492
493    /// Rank of the absorbed label-channel subspace (`r`). Zero ⇒ the absorber
494    /// is a no-op (the label channel pins no direction the dictionary must be
495    /// orthogonalized against).
496    pub fn rank(&self) -> usize {
497        self.q.ncols()
498    }
499
500    /// Orthonormal basis `Q` of the absorbed subspace (`d × r`).
501    pub fn basis(&self) -> ArrayView2<'_, f64> {
502        self.q.view()
503    }
504
505    /// Project a reconstruction-channel latent update `Δt` (`n × d`) onto the
506    /// orthogonal complement of the label-channel subspace, in-place semantics
507    /// returned as a fresh array. This is the operative orthogonalization: the
508    /// dictionary update keeps only the component the label channel does not
509    /// already explain.
510    pub fn orthogonalize_recon_update(&self, delta_t: ArrayView2<'_, f64>) -> Array2<f64> {
511        let (_, d) = delta_t.dim();
512        if self.q.ncols() == 0 || self.q.nrows() != d {
513            return delta_t.to_owned();
514        }
515        // Δt − (Δt Q) Qᵀ.
516        let proj_coords = delta_t.dot(&self.q); // (n × r)
517        let proj = proj_coords.dot(&self.q.t()); // (n × d)
518        let mut out = delta_t.to_owned();
519        out -= &proj;
520        out
521    }
522
523    /// The absorbed block to *append* to the reconstruction design: per-row
524    /// rows `Z_infl[n, :] = Δ`-direction in latent-code space, here the
525    /// row's latent code projected onto `Q` — the realized label-channel
526    /// leverage. Width `r`. This mirrors the survival/BMS install where the
527    /// realized Stage-1 leakage directions are appended as a null-penalized
528    /// absorbed block, making the β estimating equation orthogonal to its span.
529    pub fn absorbed_design_block(&self, t: ArrayView2<'_, f64>) -> Array2<f64> {
530        if self.q.ncols() == 0 {
531            return Array2::<f64>::zeros((t.nrows(), 0));
532        }
533        t.dot(&self.q)
534    }
535}
536
537/// Per-feature (per-atom) behavioral-significance report for the head.
538#[derive(Debug, Clone)]
539pub struct HeadFeatureSignificance {
540    /// Wald statistic per latent axis (feature).
541    pub statistic: Vec<f64>,
542    /// Raw p-value per latent axis.
543    pub p_value: Vec<f64>,
544    /// Indices of features rejected by e-BH at the chosen FDR level — the
545    /// features whose behavioral loading is statistically real after the
546    /// multiplicity correction. These are the reportable behaviorally-anchored
547    /// atoms.
548    pub fdr_rejected: Vec<usize>,
549    /// The FDR level the rejection set was computed at.
550    pub alpha: f64,
551}
552
553/// Per-feature significance of the head's behavioral loadings, with FDR control.
554///
555/// For each latent axis (feature) `k`, the behavioral loading is the head
556/// coefficient `w_{c,k}` across η-channels. We test the null `w_{·,k} = 0`
557/// (the atom carries no behavioral signal) with [`wood_smooth_test`] on the
558/// coefficient block for that axis, using the head's posterior covariance.
559/// Raw p-values are first Bonferroni-combined across η-channels for each axis,
560/// then converted to calibrated e-values with
561/// [`log_e_from_p_calibrator`] and fed to [`e_benjamini_hochberg`] for an
562/// FDR-controlled rejection set that needs no independence assumption across
563/// atoms — exactly the case BH cannot legally handle when atoms share tokens.
564///
565/// `coeffs` is the fitted head coefficient vector (`n_eta · (1 + d)`),
566/// `covariance` its posterior covariance of the same size, `latent_dim` is `d`,
567/// `n_eta` the channel count, and `alpha` the target FDR.
568pub fn head_feature_significance(
569    coeffs: ArrayView1<'_, f64>,
570    covariance: &Array2<f64>,
571    latent_dim: usize,
572    n_eta: usize,
573    residual_df: f64,
574    alpha: f64,
575) -> Result<HeadFeatureSignificance, String> {
576    let block = 1 + latent_dim;
577    if coeffs.len() != n_eta * block {
578        return Err(format!(
579            "head_feature_significance: coeffs length {} != n_eta·(1+d) = {}",
580            coeffs.len(),
581            n_eta * block
582        ));
583    }
584    if covariance.nrows() != coeffs.len() || covariance.ncols() != coeffs.len() {
585        return Err(format!(
586            "head_feature_significance: covariance must be {0}×{0}, got {1}×{2}",
587            coeffs.len(),
588            covariance.nrows(),
589            covariance.ncols()
590        ));
591    }
592    let beta = coeffs.to_owned();
593    let mut statistic = Vec::with_capacity(latent_dim);
594    let mut p_value = Vec::with_capacity(latent_dim);
595    let mut log_e_values = Vec::with_capacity(latent_dim);
596    for axis in 0..latent_dim {
597        // Gather the coefficient indices for this axis across all channels:
598        // for channel c the loading is at `c·block + 1 + axis`. wood_smooth_test
599        // tests a contiguous range, so for the common Binomial (n_eta = 1) case
600        // the range is a single coefficient; for multinomial we test each
601        // channel's loading and combine the minimum p-value with Bonferroni.
602        let mut best_p = 1.0_f64;
603        let mut best_stat = 0.0_f64;
604        let mut tested_channels = 0_usize;
605        for c in 0..n_eta {
606            let idx = c * block + 1 + axis;
607            let input = SmoothTestInput {
608                beta: beta.view(),
609                covariance,
610                influence_matrix: None,
611                coeff_range: idx..idx + 1,
612                edf: 1.0,
613                nullspace_dim: 1,
614                residual_df,
615                scale: SmoothTestScale::Estimated,
616            };
617            if let Some(res) = wood_smooth_test(input) {
618                tested_channels += 1;
619                if res.p_value < best_p {
620                    best_p = res.p_value;
621                    best_stat = res.statistic;
622                }
623            }
624        }
625        let axis_p = if tested_channels > 0 {
626            (best_p * tested_channels as f64).min(1.0)
627        } else {
628            1.0
629        };
630        statistic.push(best_stat);
631        p_value.push(axis_p);
632        // Numerical tail routines can underflow a mathematically positive
633        // p-value to exactly zero; calibrate at the smallest positive f64 while
634        // still reporting the Bonferroni-adjusted p-value above.
635        let calibration_p = axis_p.max(f64::MIN_POSITIVE);
636        log_e_values.push(log_e_from_p_calibrator(calibration_p).map_err(|err| {
637            format!("head_feature_significance: invalid calibrated axis p-value: {err}")
638        })?);
639    }
640    let fdr_rejected = e_benjamini_hochberg(&log_e_values, alpha);
641    Ok(HeadFeatureSignificance {
642        statistic,
643        p_value,
644        fdr_rejected,
645        alpha,
646    })
647}
648
649/// Reduce a per-row score-influence Jacobian `(n × d)` to a rank-truncated
650/// orthonormal basis of its latent-code column span. Convenience used by the
651/// absorber install site when the caller already has the `(n × d)` Jacobian
652/// for a single η-channel (the common Binomial case). Returns `Q ∈ ℝ^{d × r}`,
653/// `r` = numerical rank of the column span (rank-deficient inputs yield a
654/// low-rank `Q`, so a vacuous label channel orthogonalizes against nothing).
655pub fn orthonormal_span(jacobian: ArrayView2<'_, f64>) -> Result<Array2<f64>, String> {
656    let (n, d) = jacobian.dim();
657    if d == 0 || n == 0 {
658        return Ok(Array2::<f64>::zeros((d, 0)));
659    }
660    // The right singular vectors of `J` span its row space = the column span
661    // of `Jᵀ` in ℝ^d. Equivalently the eigenvectors of the d × d Gram `JᵀJ`;
662    // route through the SVD bridge for a stable, rank-truncated basis.
663    let gram = jacobian.t().dot(&jacobian);
664    let (u_opt, sv, _vt) = gram
665        .svd(true, false)
666        .map_err(|e| format!("orthonormal_span: SVD failed: {e}"))?;
667    let u = u_opt.ok_or_else(|| "orthonormal_span: SVD did not return U".to_string())?;
668    let max_sv = sv.iter().cloned().fold(0.0_f64, f64::max);
669    let tol = max_sv * (d as f64) * f64::EPSILON;
670    let rank = sv.iter().filter(|&&s| s > tol).count();
671    let mut q = Array2::<f64>::zeros((d, rank));
672    for col in 0..rank {
673        for r in 0..d {
674            q[[r, col]] = u[[r, col]];
675        }
676    }
677    Ok(q)
678}