Expand description
Per-pair drill-down: surfaces which specific turn in the paired trace set drove each aggregate axis regression.
The nine-axis report is informative in aggregate but hides where
the regression happened. A reviewer looking at a PR with 50 trace
pairs sees trajectory: delta +0.42, severe but has to hand-audit
every pair to find which ones actually regressed. This module
computes per-pair, per-axis deltas and returns the top-K
most-regressive pairs ranked by a normalised aggregate score.
Design choices:
-
No bootstrap per pair. Bootstrap CIs are an aggregate-level stat; per-pair we need only the raw deltas. This keeps drill-down cheap (O(N) extraction per axis, no resampling).
-
Self-contained extractors. Rather than refactor the nine axis modules to expose their per-pair internals, we re-implement the (small) extractors here. Each is ≤ 20 lines; duplicating them buys independence — drill-down’s value function can evolve without touching the statistical axis implementations.
-
Normalised ranking. Raw deltas have wildly different scales (0-1 for semantic, 0-10000 for latency_ms). Each axis has a per-axis
scaleused to normalise deltas into [0, ~1] so they can be summed into a single regression score. Scales are calibrated against the axis’s Severity::Severe threshold so a per-pair normalised delta of 1.0 corresponds roughly to one severity-severe-sized movement. -
Text similarity via character-shingle Jaccard. The aggregate semantic axis uses corpus-level TF-IDF, which is meaningless per singleton-pair. Character-shingle Jaccard (what alignment.rs uses) is the same proxy first-divergence detection uses, so drill-down results are internally consistent with the first-divergence row.
-
Judge is skipped. The Rust core never populates the Judge axis (it’s Python-side). Including it here would produce spurious zeros for every pair.
Structs§
- Pair
Axis Score - One axis’s contribution to a single pair’s drill-down row.
- Pair
Drilldown - One pair’s per-axis breakdown plus an aggregate regression score.
Constants§
- DEFAULT_
K - Default number of pairs to surface in a drill-down list. Matches
alignment::DEFAULT_Kso downstream renderers can share the same “show 3 inline, collapse the rest” heuristic.
Functions§
- compute
- Compute drill-down rows for every pair, return the top-
top_ksorted byregression_scoredescending.