Expand description
First-divergence detection over paired chat responses.
Given two traces, this module identifies the first turn at which the candidate meaningfully diverged from the baseline and classifies the divergence as one of three kinds:
- Structural — the tool-call sequence differs (missing, extra, or
reordered calls). Surfaces as a gap in the alignment, OR a same-
position pair with different
tool_names. - Decision — the tool sequence matches but the decision changed:
same tool, different arg values; final-answer semantic cosine < 0.8;
stop_reasonflipped; refusal where there wasn’t one. - Style — cosmetic wording differences only; semantic cosine ≥ 0.9, identical tool shape, identical stop_reason.
§Algorithm
A Needleman-Wunsch global alignment with Gotoh affine gap penalties
pairs baseline and candidate chat_response records. The cost for
aligning pair (a, b) is:
cost(a, b) = w_struct * (1 - jaccard(tool_shape_a, tool_shape_b))
+ w_sem * (1 - text_similarity(a, b))
+ w_stop * stop_reason_mismatch(a, b)After alignment, we walk the alignment path left-to-right and emit the first cell whose per-cell divergence exceeds the noise floor.
§Why NW and not position-match
Position-match fails when one side inserts or drops a turn (common when one config retries where the other doesn’t). NW pays a controlled gap cost instead of mis-pairing every subsequent turn. Cost is O(n·m) in DP cells; traces rarely exceed ~100 turns, so runtime is trivial in practice.
Structs§
- First
Divergence - First meaningful divergence between two traces.
Enums§
- Divergence
Kind - Classification of the first divergence between two traces.
Constants§
- DEFAULT_
K - Default number of top-ranked divergences returned by
detect_top_k. The markdown / terminal renderers show the top 3; the full list goes to the JSON output. Users can override via the explicitkparameter.
Functions§
- detect
- Detect the first meaningful divergence between two traces.
- detect_
top_ k - Detect up to
kmeaningful divergences between two traces, sorted by importance (kind severity × confidence, descending).