Module drill_down

Expand description

Per-pair drill-down: surfaces which specific turn in the paired trace set drove each aggregate axis regression.

The nine-axis report is informative in aggregate but hides where the regression happened. A reviewer looking at a PR with 50 trace pairs sees trajectory: delta +0.42, severe but has to hand-audit every pair to find which ones actually regressed. This module computes per-pair, per-axis deltas and returns the top-K most-regressive pairs ranked by a normalised aggregate score.

Design choices:

No bootstrap per pair. Bootstrap CIs are an aggregate-level stat; per-pair we need only the raw deltas. This keeps drill-down cheap (O(N) extraction per axis, no resampling).
Self-contained extractors. Rather than refactor the nine axis modules to expose their per-pair internals, we re-implement the (small) extractors here. Each is ≤ 20 lines; duplicating them buys independence — drill-down’s value function can evolve without touching the statistical axis implementations.
Normalised ranking. Raw deltas have wildly different scales (0-1 for semantic, 0-10000 for latency_ms). Each axis has a per-axis scale used to normalise deltas into [0, ~1] so they can be summed into a single regression score. Scales are calibrated against the axis’s Severity::Severe threshold so a per-pair normalised delta of 1.0 corresponds roughly to one severity-severe-sized movement.
Text similarity via character-shingle Jaccard. The aggregate semantic axis uses corpus-level TF-IDF, which is meaningless per singleton-pair. Character-shingle Jaccard (what alignment.rs uses) is the same proxy first-divergence detection uses, so drill-down results are internally consistent with the first-divergence row.
Judge is skipped. The Rust core never populates the Judge axis (it’s Python-side). Including it here would produce spurious zeros for every pair.

Structs§

PairAxisScore: One axis’s contribution to a single pair’s drill-down row.
PairDrilldown: One pair’s per-axis breakdown plus an aggregate regression score.

Constants§

DEFAULT_K: Default number of pairs to surface in a drill-down list. Matches alignment::DEFAULT_K so downstream renderers can share the same “show 3 inline, collapse the rest” heuristic.

Functions§

compute: Compute drill-down rows for every pair, return the top-top_k sorted by regression_score descending.