gam_terms/decoders/behavioral_head.rs
1//! Behaviorally-anchored SAE head (issue #912).
2//!
3//! # What this is
4//!
5//! An unsupervised SAE dictionary — manifold or linear — pins a concept's
6//! *shape* and its coordinate up to isometry. Both are functionals of `p(x)`.
7//! It cannot pin the map from coordinate to behavior, `p(y|x)`, which is
8//! formally independent of `p(x)`: two models with identical activation
9//! geometry can read out differently. The behavioral coefficient an atom
10//! "gets" from a post-hoc probe is therefore not a refinement of the
11//! dictionary — it is the *only* labeled source of the one thing the
12//! dictionary structurally lacks.
13//!
14//! The fix is to make the behavior part of the model. Instead of treating the
15//! auxiliary `u` as a fixed covariate in a conditional Gaussian *prior* on the
16//! latent codes (`LatentIdMode::AuxPrior`, the iVAE gauge), this module
17//! promotes the auxiliary signal to a *modeled outcome*: a GLM behavioral head
18//!
19//! ```text
20//! g(E[y_n | t_n]) = a + t_n · w
21//! ```
22//!
23//! whose design columns are the latent codes `t_n` themselves. The head's
24//! coefficients `(a, w)` live in the β tier and the head log-likelihood enters
25//! the *same* Laplace/REML objective as the reconstruction channel, so REML
26//! balances reconstruction vs. behavioral fit automatically — no hand-tuned
27//! trade-off scalar (magic by default). Because the design depends on `ψ` (the
28//! latent codes move during the joint fit), the head couples to the latent
29//! block exactly the way the arrow-Schur border already hosts a β-border
30//! coupled to per-row latent blocks.
31//!
32//! # The three pieces
33//!
34//! 1. [`BehavioralHead`] — the head GLM itself: value + gradient of the head
35//! log-likelihood w.r.t. the head coefficients `(a, w)` AND w.r.t. the
36//! latent codes `t` (the cross-channel coupling), under a [`RowSubsampleMask`]
37//! weighting so unlabeled rows carry zero head weight (semi-supervised).
38//!
39//! 2. [`LeakageAbsorber`] — the #461 Neyman-orthogonal device. Joint fitting
40//! can sculpt the dictionary to *encode the label* (rediscover your own
41//! probe). The absorber widens the reconstruction design with the head's
42//! score-influence directions so the dictionary update is orthogonalized
43//! against the label channel. The boundary it enforces is precisely
44//! "orient what `p(x)` put there" vs. "hallucinate geometry from the label"
45//! — the novel statistical content of the whole construction.
46//!
47//! 3. [`head_feature_significance`] — per-feature (per-atom) significance of
48//! the behavioral loading via [`wood_smooth_test`], converted to an
49//! FDR-controlled report through the e-BH multiplicity machinery
50//! ([`e_benjamini_hochberg`]). Features whose behavioral signal survives
51//! orthogonalization AND the multiplicity correction are the reportable
52//! ones.
53
54use crate::inference::smooth_test::{SmoothTestInput, SmoothTestScale, wood_smooth_test};
55use crate::inference::structure_evidence::{e_benjamini_hochberg, log_e_from_p_calibrator};
56use gam_problem::RowSubsampleMask;
57use gam_linalg::faer_ndarray::FaerSvd;
58use ndarray::{Array1, Array2, ArrayView1, ArrayView2};
59
60/// Outcome family for the behavioral head.
61///
62/// `Binomial` is the canonical safety-probe family (a binary label —
63/// deception, harmfulness — read out from the latent codes). `Multinomial`
64/// covers a categorical behavioral label with `n_classes` levels via a
65/// shared-design softmax head; class 0 is the reference. Both are the same
66/// families already in `src/families/`; this enum only selects the head's
67/// link + log-likelihood, it does not re-implement the families.
68#[derive(Debug, Clone, Copy, PartialEq, Eq)]
69pub enum AuxOutcomeFamily {
70 /// Logistic head: `logit P(y=1 | t) = a + t·w`. A single binary label
71 /// pins roughly one gauge dimension, which is why `AuxOutcome` composes
72 /// with `DimSelection` ARD + the isometry pin rather than replacing them.
73 Binomial,
74 /// Softmax head over `n_classes` levels (class 0 reference). The behavioral
75 /// subspace it can orient has dimension at most `n_classes − 1`, still
76 /// low-dimensional — the geometry does the shape work.
77 Multinomial { n_classes: usize },
78}
79
80impl AuxOutcomeFamily {
81 /// Number of linear-predictor channels the head produces per row.
82 /// Binomial = 1; Multinomial with `K` classes = `K − 1` (reference-coded).
83 pub fn n_eta_channels(&self) -> usize {
84 match self {
85 AuxOutcomeFamily::Binomial => 1,
86 AuxOutcomeFamily::Multinomial { n_classes } => n_classes.saturating_sub(1),
87 }
88 }
89}
90
91/// The behavioral head GLM.
92///
93/// Stores the labels `y` (length `n`), the per-row head weights `w_row`
94/// (length `n`; 0 ⇒ unlabeled, the semi-supervised seam), and the family.
95/// The design is *not* stored: it is the live latent-code matrix `t`
96/// (`n × d`) passed at evaluation time, because `t` moves during the joint
97/// fit. The head coefficients are `(intercept, loadings)` with one
98/// `(1 + d)`-vector per η-channel.
99#[derive(Debug, Clone)]
100pub struct BehavioralHead {
101 family: AuxOutcomeFamily,
102 /// For `Binomial`: the 0/1 label per row. For `Multinomial`: the class
103 /// index in `0..n_classes` per row (stored as `f64`, integral-valued).
104 y: Array1<f64>,
105 /// Per-row head-channel weight. `0.0` on unlabeled rows (semi-supervised);
106 /// `1.0` on labeled rows by default. Derived from a [`RowSubsampleMask`] via
107 /// [`BehavioralHead::with_row_measure`].
108 w_row: Array1<f64>,
109}
110
111impl BehavioralHead {
112 /// Build a head from labels and an explicit per-row weight vector.
113 ///
114 /// `family.n_eta_channels()` channels each get a `(1 + d)` coefficient
115 /// block at fit time. Validates label range against the family.
116 pub fn new(
117 family: AuxOutcomeFamily,
118 y: Array1<f64>,
119 w_row: Array1<f64>,
120 ) -> Result<Self, String> {
121 let n = y.len();
122 if w_row.len() != n {
123 return Err(format!(
124 "BehavioralHead: w_row length {} != labels length {n}",
125 w_row.len()
126 ));
127 }
128 for &v in w_row.iter() {
129 if !(v.is_finite() && v >= 0.0) {
130 return Err(format!(
131 "BehavioralHead: row weights must be finite and ≥ 0, got {v}"
132 ));
133 }
134 }
135 match family {
136 AuxOutcomeFamily::Binomial => {
137 for (i, &label) in y.iter().enumerate() {
138 if label != 0.0 && label != 1.0 {
139 return Err(format!(
140 "BehavioralHead(Binomial): label[{i}] = {label} is not 0/1"
141 ));
142 }
143 }
144 }
145 AuxOutcomeFamily::Multinomial { n_classes } => {
146 if n_classes < 2 {
147 return Err(format!(
148 "BehavioralHead(Multinomial): need ≥ 2 classes, got {n_classes}"
149 ));
150 }
151 for (i, &label) in y.iter().enumerate() {
152 let k = label as usize;
153 if k as f64 != label || k >= n_classes {
154 return Err(format!(
155 "BehavioralHead(Multinomial): label[{i}] = {label} not an \
156 integer class index in 0..{n_classes}"
157 ));
158 }
159 }
160 }
161 }
162 Ok(Self { family, y, w_row })
163 }
164
165 /// Build a head where every row carries unit head weight (fully supervised).
166 pub fn fully_supervised(family: AuxOutcomeFamily, y: Array1<f64>) -> Result<Self, String> {
167 let n = y.len();
168 Self::new(family, y, Array1::from_elem(n, 1.0))
169 }
170
171 /// Build a head whose per-row weights come from a [`RowSubsampleMask`]: rows
172 /// outside the measure (unlabeled) get weight 0, rows inside get the
173 /// measure's weight. This is the semi-supervised seam — no new mechanism,
174 /// the same row-weighting the inner solver already enforces for ρ-coherence.
175 pub fn with_row_measure(
176 family: AuxOutcomeFamily,
177 y: Array1<f64>,
178 measure: &RowSubsampleMask,
179 ) -> Result<Self, String> {
180 let n = y.len();
181 let (indices, weights) = measure.indices_and_weights(n);
182 let mut w_row = Array1::<f64>::zeros(n);
183 for &idx in &indices {
184 if idx < n {
185 w_row[idx] = weights[idx];
186 }
187 }
188 Self::new(family, y, w_row)
189 }
190
191 pub fn family(&self) -> AuxOutcomeFamily {
192 self.family
193 }
194
195 pub fn n_obs(&self) -> usize {
196 self.y.len()
197 }
198
199 /// Number of head coefficients given latent dimension `d`: one
200 /// `(1 + d)` block (intercept + per-axis loading) per η-channel.
201 pub fn n_coeffs(&self, latent_dim: usize) -> usize {
202 self.family.n_eta_channels() * (1 + latent_dim)
203 }
204
205 /// Total effective head-channel sample size `Σ_n w_row[n]` — the number of
206 /// labeled rows (weighted). Zero ⇒ a vacuous head (every row unlabeled),
207 /// which the validator rejects: a head with no labels cannot anchor a gauge.
208 pub fn effective_labeled_count(&self) -> f64 {
209 self.w_row.iter().sum()
210 }
211
212 /// Per-row, per-channel linear predictor `η[n, c] = a_c + t_n · w_c`.
213 fn eta(&self, t: ArrayView2<'_, f64>, coeffs: ArrayView1<'_, f64>) -> Array2<f64> {
214 let (n, d) = t.dim();
215 let n_eta = self.family.n_eta_channels();
216 let mut eta = Array2::<f64>::zeros((n, n_eta));
217 for c in 0..n_eta {
218 let base = c * (1 + d);
219 let a = coeffs[base];
220 for row in 0..n {
221 let mut acc = a;
222 for axis in 0..d {
223 acc += t[[row, axis]] * coeffs[base + 1 + axis];
224 }
225 eta[[row, c]] = acc;
226 }
227 }
228 eta
229 }
230
231 /// Negative head log-likelihood and its gradient w.r.t. **both** the head
232 /// coefficients and the latent codes `t`.
233 ///
234 /// Returns `(nll, grad_coeffs, grad_t)` where:
235 /// * `nll = −Σ_n w_row[n] · log p(y_n | η_n)` (weighted),
236 /// * `grad_coeffs` has length `n_coeffs(d)` — the head-coefficient gradient
237 /// that drives the β-tier update,
238 /// * `grad_t` is `(n, d)` — the cross-channel coupling that flows into the
239 /// latent-block gradient (the arrow-Schur border coupling).
240 ///
241 /// Convention matches the rest of the latent objective: this is the
242 /// *negative* log-likelihood, so it adds to the joint cost and its gradient
243 /// adds to the joint gradient.
244 pub fn neg_loglik_and_grad(
245 &self,
246 t: ArrayView2<'_, f64>,
247 coeffs: ArrayView1<'_, f64>,
248 ) -> Result<(f64, Array1<f64>, Array2<f64>), String> {
249 let (n, d) = t.dim();
250 if n != self.y.len() {
251 return Err(format!(
252 "BehavioralHead: latent rows {n} != labels {}",
253 self.y.len()
254 ));
255 }
256 let n_eta = self.family.n_eta_channels();
257 if coeffs.len() != n_eta * (1 + d) {
258 return Err(format!(
259 "BehavioralHead: coeffs length {} != n_eta·(1+d) = {}",
260 coeffs.len(),
261 n_eta * (1 + d)
262 ));
263 }
264 let eta = self.eta(t, coeffs);
265 let mut nll = 0.0_f64;
266 let mut grad_coeffs = Array1::<f64>::zeros(n_eta * (1 + d));
267 let mut grad_t = Array2::<f64>::zeros((n, d));
268
269 match self.family {
270 AuxOutcomeFamily::Binomial => {
271 for row in 0..n {
272 let w = self.w_row[row];
273 if w == 0.0 {
274 continue;
275 }
276 let e = eta[[row, 0]];
277 // Numerically-stable logistic NLL: log(1+exp(η)) − y·η.
278 let log1p = if e > 0.0 {
279 e + (-e).exp().ln_1p()
280 } else {
281 e.exp().ln_1p()
282 };
283 let y = self.y[row];
284 nll += w * (log1p - y * e);
285 // dNLL/dη = p − y, p = σ(η).
286 let p = 1.0 / (1.0 + (-e).exp());
287 let r = w * (p - y);
288 grad_coeffs[0] += r;
289 for axis in 0..d {
290 grad_coeffs[1 + axis] += r * t[[row, axis]];
291 grad_t[[row, axis]] += r * coeffs[1 + axis];
292 }
293 }
294 }
295 AuxOutcomeFamily::Multinomial { .. } => {
296 for row in 0..n {
297 let w = self.w_row[row];
298 if w == 0.0 {
299 continue;
300 }
301 // Softmax over the K−1 free channels plus the implicit
302 // reference channel (η_0 ≡ 0). LSE includes the 0 term.
303 let mut max_eta = 0.0_f64;
304 for c in 0..n_eta {
305 if eta[[row, c]] > max_eta {
306 max_eta = eta[[row, c]];
307 }
308 }
309 let mut denom = (0.0 - max_eta).exp();
310 for c in 0..n_eta {
311 denom += (eta[[row, c]] - max_eta).exp();
312 }
313 let lse = max_eta + denom.ln();
314 let label = self.y[row] as usize;
315 // log p(y) = η_y − lse, with η_0 = 0 for the reference class.
316 let eta_y = if label == 0 {
317 0.0
318 } else {
319 eta[[row, label - 1]]
320 };
321 nll += w * (lse - eta_y);
322 // dNLL/dη_c = p_c − 1{y = c+1}, for free channel c (class c+1).
323 for c in 0..n_eta {
324 let p_c = (eta[[row, c]] - lse).exp();
325 let indicator = if label == c + 1 { 1.0 } else { 0.0 };
326 let r = w * (p_c - indicator);
327 let base = c * (1 + d);
328 grad_coeffs[base] += r;
329 for axis in 0..d {
330 grad_coeffs[base + 1 + axis] += r * t[[row, axis]];
331 grad_t[[row, axis]] += r * coeffs[base + 1 + axis];
332 }
333 }
334 }
335 }
336 }
337 Ok((nll, grad_coeffs, grad_t))
338 }
339
340 /// Per-row, per-channel head working weights `s[n, c]` — the diagonal of the
341 /// Fisher information in η-space (`p(1−p)` logistic, `p_c(1−p_c)` softmax
342 /// diagonal). These weight the score-influence directions the leakage
343 /// absorber appends to the reconstruction design (the realized label-channel
344 /// leverage per row), times the row's head weight so unlabeled rows
345 /// contribute nothing to the absorber, exactly as they contribute nothing
346 /// to the head likelihood.
347 pub fn head_working_weights(
348 &self,
349 t: ArrayView2<'_, f64>,
350 coeffs: ArrayView1<'_, f64>,
351 ) -> Result<Array2<f64>, String> {
352 let (n, d) = t.dim();
353 let n_eta = self.family.n_eta_channels();
354 if coeffs.len() != n_eta * (1 + d) {
355 return Err("BehavioralHead::head_working_weights: coeff length mismatch".to_string());
356 }
357 let eta = self.eta(t, coeffs);
358 let mut s = Array2::<f64>::zeros((n, n_eta));
359 match self.family {
360 AuxOutcomeFamily::Binomial => {
361 for row in 0..n {
362 let p = 1.0 / (1.0 + (-eta[[row, 0]]).exp());
363 s[[row, 0]] = self.w_row[row] * p * (1.0 - p);
364 }
365 }
366 AuxOutcomeFamily::Multinomial { .. } => {
367 for row in 0..n {
368 let mut max_eta = 0.0_f64;
369 for c in 0..n_eta {
370 if eta[[row, c]] > max_eta {
371 max_eta = eta[[row, c]];
372 }
373 }
374 let mut denom = (0.0 - max_eta).exp();
375 for c in 0..n_eta {
376 denom += (eta[[row, c]] - max_eta).exp();
377 }
378 let lse = max_eta + denom.ln();
379 for c in 0..n_eta {
380 let p_c = (eta[[row, c]] - lse).exp();
381 s[[row, c]] = self.w_row[row] * p_c * (1.0 - p_c);
382 }
383 }
384 }
385 }
386 Ok(s)
387 }
388}
389
390/// The #461 Neyman-orthogonal leakage absorber for the behavioral head.
391///
392/// # The boundary it enforces
393///
394/// You want the behavioral channel to *orient* the existing manifold (fix the
395/// frame) but *not* to *invent* geometry absent from `p(x)` (sculpt a manifold
396/// to fit the label). The orthogonalization is precisely the boundary between
397/// those two — between "orient what's there" and "hallucinate structure from
398/// the label." Getting it exactly right is the single most important statistical
399/// content of the whole construction.
400///
401/// # Mechanism (mirrors the survival/BMS install)
402///
403/// The head's score-influence directions in latent-code space are the rows of
404/// the *score-influence Jacobian*
405///
406/// ```text
407/// Z[n, :] = √s_n · ∂η_n/∂t_n = √s_n · w (per η-channel)
408/// ```
409///
410/// i.e. the realized, per-row, Fisher-weighted directions along which a change
411/// in the latent codes moves the label-channel linear predictor. We
412/// orthonormalize their span (thin QR) to obtain the *label-channel subspace*
413/// `Q` in latent-code space. The dictionary (reconstruction) update is then
414/// projected onto the orthogonal complement of `Q`:
415///
416/// ```text
417/// Δt_recon ← (I − Q Qᵀ) Δt_recon
418/// ```
419///
420/// so the reconstruction channel can only move the codes in directions the
421/// label channel does *not* already explain. Equivalently, the reconstruction
422/// design is widened with `Q` as a null-penalized absorbed block, making the
423/// dictionary's estimating equation orthogonal to `span(Q)` — the label channel
424/// orients the frame, but cannot drag the dictionary toward encoding the label.
425#[derive(Debug, Clone)]
426pub struct LeakageAbsorber {
427 /// Orthonormal basis `Q ∈ ℝ^{d × r}` of the label-channel subspace in
428 /// latent-code space (`r ≤ min(d, n_eta)`). The reconstruction update is
429 /// projected onto `range(Q)^⊥`.
430 q: Array2<f64>,
431}
432
433impl LeakageAbsorber {
434 /// Build the absorber from the head's score-influence Jacobian.
435 ///
436 /// `score_influence` is `(n × (d · n_eta))`: for each row, the stacked
437 /// per-channel Fisher-weighted direction `√s_{n,c} · w_c` flattened over
438 /// channels. We take the *column* span (the directions in latent-code space
439 /// the label channel is sensitive to), reduced to an orthonormal basis via
440 /// thin SVD with a relative tolerance, so a rank-deficient or vacuous label
441 /// channel yields a low-rank (possibly empty) `Q` and the absorber is a
442 /// no-op — the honest behavior when the label pins little gauge.
443 pub fn from_score_influence(
444 score_influence: ArrayView2<'_, f64>,
445 latent_dim: usize,
446 ) -> Result<Self, String> {
447 let (n, cols) = score_influence.dim();
448 if latent_dim == 0 {
449 return Ok(Self {
450 q: Array2::<f64>::zeros((0, 0)),
451 });
452 }
453 if cols % latent_dim != 0 {
454 return Err(format!(
455 "LeakageAbsorber: score_influence has {cols} columns, not a multiple of \
456 latent_dim {latent_dim}"
457 ));
458 }
459 let n_eta = cols / latent_dim;
460 // Accumulate the (d × d) latent-space Gram of the label-channel
461 // directions: G = Σ_n Σ_c v_{n,c} v_{n,c}ᵀ where v_{n,c} ∈ ℝ^d is the
462 // c-th channel's score-influence direction for row n.
463 let mut gram = Array2::<f64>::zeros((latent_dim, latent_dim));
464 for row in 0..n {
465 for c in 0..n_eta {
466 let base = c * latent_dim;
467 for i in 0..latent_dim {
468 let vi = score_influence[[row, base + i]];
469 for j in 0..latent_dim {
470 gram[[i, j]] += vi * score_influence[[row, base + j]];
471 }
472 }
473 }
474 }
475 // Eigenvectors of G via SVD (G symmetric PSD) give the principal
476 // label-channel directions; keep those above a relative tolerance.
477 let (u_opt, sv, _vt) = gram
478 .svd(true, false)
479 .map_err(|e| format!("LeakageAbsorber: SVD of label-channel Gram failed: {e}"))?;
480 let u = u_opt.ok_or_else(|| "LeakageAbsorber: SVD did not return U".to_string())?;
481 let max_sv = sv.iter().cloned().fold(0.0_f64, f64::max);
482 let tol = max_sv * (latent_dim as f64) * f64::EPSILON;
483 let rank = sv.iter().filter(|&&s| s > tol).count();
484 let mut q = Array2::<f64>::zeros((latent_dim, rank));
485 for col in 0..rank {
486 for r in 0..latent_dim {
487 q[[r, col]] = u[[r, col]];
488 }
489 }
490 Ok(Self { q })
491 }
492
493 /// Rank of the absorbed label-channel subspace (`r`). Zero ⇒ the absorber
494 /// is a no-op (the label channel pins no direction the dictionary must be
495 /// orthogonalized against).
496 pub fn rank(&self) -> usize {
497 self.q.ncols()
498 }
499
500 /// Orthonormal basis `Q` of the absorbed subspace (`d × r`).
501 pub fn basis(&self) -> ArrayView2<'_, f64> {
502 self.q.view()
503 }
504
505 /// Project a reconstruction-channel latent update `Δt` (`n × d`) onto the
506 /// orthogonal complement of the label-channel subspace, in-place semantics
507 /// returned as a fresh array. This is the operative orthogonalization: the
508 /// dictionary update keeps only the component the label channel does not
509 /// already explain.
510 pub fn orthogonalize_recon_update(&self, delta_t: ArrayView2<'_, f64>) -> Array2<f64> {
511 let (_, d) = delta_t.dim();
512 if self.q.ncols() == 0 || self.q.nrows() != d {
513 return delta_t.to_owned();
514 }
515 // Δt − (Δt Q) Qᵀ.
516 let proj_coords = delta_t.dot(&self.q); // (n × r)
517 let proj = proj_coords.dot(&self.q.t()); // (n × d)
518 let mut out = delta_t.to_owned();
519 out -= &proj;
520 out
521 }
522
523 /// The absorbed block to *append* to the reconstruction design: per-row
524 /// rows `Z_infl[n, :] = Δ`-direction in latent-code space, here the
525 /// row's latent code projected onto `Q` — the realized label-channel
526 /// leverage. Width `r`. This mirrors the survival/BMS install where the
527 /// realized Stage-1 leakage directions are appended as a null-penalized
528 /// absorbed block, making the β estimating equation orthogonal to its span.
529 pub fn absorbed_design_block(&self, t: ArrayView2<'_, f64>) -> Array2<f64> {
530 if self.q.ncols() == 0 {
531 return Array2::<f64>::zeros((t.nrows(), 0));
532 }
533 t.dot(&self.q)
534 }
535}
536
537/// Per-feature (per-atom) behavioral-significance report for the head.
538#[derive(Debug, Clone)]
539pub struct HeadFeatureSignificance {
540 /// Wald statistic per latent axis (feature).
541 pub statistic: Vec<f64>,
542 /// Raw p-value per latent axis.
543 pub p_value: Vec<f64>,
544 /// Indices of features rejected by e-BH at the chosen FDR level — the
545 /// features whose behavioral loading is statistically real after the
546 /// multiplicity correction. These are the reportable behaviorally-anchored
547 /// atoms.
548 pub fdr_rejected: Vec<usize>,
549 /// The FDR level the rejection set was computed at.
550 pub alpha: f64,
551}
552
553/// Per-feature significance of the head's behavioral loadings, with FDR control.
554///
555/// For each latent axis (feature) `k`, the behavioral loading is the head
556/// coefficient `w_{c,k}` across η-channels. We test the null `w_{·,k} = 0`
557/// (the atom carries no behavioral signal) with [`wood_smooth_test`] on the
558/// coefficient block for that axis, using the head's posterior covariance.
559/// Raw p-values are first Bonferroni-combined across η-channels for each axis,
560/// then converted to calibrated e-values with
561/// [`log_e_from_p_calibrator`] and fed to [`e_benjamini_hochberg`] for an
562/// FDR-controlled rejection set that needs no independence assumption across
563/// atoms — exactly the case BH cannot legally handle when atoms share tokens.
564///
565/// `coeffs` is the fitted head coefficient vector (`n_eta · (1 + d)`),
566/// `covariance` its posterior covariance of the same size, `latent_dim` is `d`,
567/// `n_eta` the channel count, and `alpha` the target FDR.
568pub fn head_feature_significance(
569 coeffs: ArrayView1<'_, f64>,
570 covariance: &Array2<f64>,
571 latent_dim: usize,
572 n_eta: usize,
573 residual_df: f64,
574 alpha: f64,
575) -> Result<HeadFeatureSignificance, String> {
576 let block = 1 + latent_dim;
577 if coeffs.len() != n_eta * block {
578 return Err(format!(
579 "head_feature_significance: coeffs length {} != n_eta·(1+d) = {}",
580 coeffs.len(),
581 n_eta * block
582 ));
583 }
584 if covariance.nrows() != coeffs.len() || covariance.ncols() != coeffs.len() {
585 return Err(format!(
586 "head_feature_significance: covariance must be {0}×{0}, got {1}×{2}",
587 coeffs.len(),
588 covariance.nrows(),
589 covariance.ncols()
590 ));
591 }
592 let beta = coeffs.to_owned();
593 let mut statistic = Vec::with_capacity(latent_dim);
594 let mut p_value = Vec::with_capacity(latent_dim);
595 let mut log_e_values = Vec::with_capacity(latent_dim);
596 for axis in 0..latent_dim {
597 // Gather the coefficient indices for this axis across all channels:
598 // for channel c the loading is at `c·block + 1 + axis`. wood_smooth_test
599 // tests a contiguous range, so for the common Binomial (n_eta = 1) case
600 // the range is a single coefficient; for multinomial we test each
601 // channel's loading and combine the minimum p-value with Bonferroni.
602 let mut best_p = 1.0_f64;
603 let mut best_stat = 0.0_f64;
604 let mut tested_channels = 0_usize;
605 for c in 0..n_eta {
606 let idx = c * block + 1 + axis;
607 let input = SmoothTestInput {
608 beta: beta.view(),
609 covariance,
610 influence_matrix: None,
611 coeff_range: idx..idx + 1,
612 edf: 1.0,
613 nullspace_dim: 1,
614 residual_df,
615 scale: SmoothTestScale::Estimated,
616 };
617 if let Some(res) = wood_smooth_test(input) {
618 tested_channels += 1;
619 if res.p_value < best_p {
620 best_p = res.p_value;
621 best_stat = res.statistic;
622 }
623 }
624 }
625 let axis_p = if tested_channels > 0 {
626 (best_p * tested_channels as f64).min(1.0)
627 } else {
628 1.0
629 };
630 statistic.push(best_stat);
631 p_value.push(axis_p);
632 // Numerical tail routines can underflow a mathematically positive
633 // p-value to exactly zero; calibrate at the smallest positive f64 while
634 // still reporting the Bonferroni-adjusted p-value above.
635 let calibration_p = axis_p.max(f64::MIN_POSITIVE);
636 log_e_values.push(log_e_from_p_calibrator(calibration_p).map_err(|err| {
637 format!("head_feature_significance: invalid calibrated axis p-value: {err}")
638 })?);
639 }
640 let fdr_rejected = e_benjamini_hochberg(&log_e_values, alpha);
641 Ok(HeadFeatureSignificance {
642 statistic,
643 p_value,
644 fdr_rejected,
645 alpha,
646 })
647}
648
649/// Reduce a per-row score-influence Jacobian `(n × d)` to a rank-truncated
650/// orthonormal basis of its latent-code column span. Convenience used by the
651/// absorber install site when the caller already has the `(n × d)` Jacobian
652/// for a single η-channel (the common Binomial case). Returns `Q ∈ ℝ^{d × r}`,
653/// `r` = numerical rank of the column span (rank-deficient inputs yield a
654/// low-rank `Q`, so a vacuous label channel orthogonalizes against nothing).
655pub fn orthonormal_span(jacobian: ArrayView2<'_, f64>) -> Result<Array2<f64>, String> {
656 let (n, d) = jacobian.dim();
657 if d == 0 || n == 0 {
658 return Ok(Array2::<f64>::zeros((d, 0)));
659 }
660 // The right singular vectors of `J` span its row space = the column span
661 // of `Jᵀ` in ℝ^d. Equivalently the eigenvectors of the d × d Gram `JᵀJ`;
662 // route through the SVD bridge for a stable, rank-truncated basis.
663 let gram = jacobian.t().dot(&jacobian);
664 let (u_opt, sv, _vt) = gram
665 .svd(true, false)
666 .map_err(|e| format!("orthonormal_span: SVD failed: {e}"))?;
667 let u = u_opt.ok_or_else(|| "orthonormal_span: SVD did not return U".to_string())?;
668 let max_sv = sv.iter().cloned().fold(0.0_f64, f64::max);
669 let tol = max_sv * (d as f64) * f64::EPSILON;
670 let rank = sv.iter().filter(|&&s| s > tol).count();
671 let mut q = Array2::<f64>::zeros((d, rank));
672 for col in 0..rank {
673 for r in 0..d {
674 q[[r, col]] = u[[r, col]];
675 }
676 }
677 Ok(q)
678}