Expand description
Nine-axis behavioral differ, bootstrap CI, and report renderers.
See README.md “The nine axes” for the list and SPEC §Replay for what “diverges” means in this context.
Usage:
let pricing = diff::cost::Pricing::new();
let report = diff::compute_report(&baseline, &candidate, &pricing, Some(42));
println!("{}", report.to_terminal());Re-exports§
pub use alignment::DivergenceKind;pub use alignment::FirstDivergence;pub use axes::Axis;pub use axes::AxisStat;pub use axes::Severity;pub use bootstrap::paired_ci;pub use bootstrap::CiResult;pub use drill_down::PairAxisScore;pub use drill_down::PairDrilldown;pub use recommendations::ActionKind;pub use recommendations::Recommendation;pub use recommendations::RecommendationSeverity;pub use report::DiffReport;
Modules§
- alignment
- First-divergence detection over paired chat responses.
- axes
- Shared types for the nine-axis behavioral diff.
- bootstrap
- Bootstrap resampling for paired statistics.
- conformance
- Axis 9: schema / format conformance rate.
- cost
- Axis 6: cost (input+output tokens × per-model pricing).
- drill_
down - Per-pair drill-down: surfaces which specific turn in the paired trace set drove each aggregate axis regression.
- embedder
- Pluggable embedding backend for the semantic axis.
- judge
- Axis 8: LLM-judge (user-supplied rubric).
- latency
- Axis 5: end-to-end latency (SPEC §4.2
chat_response.latency_ms). - reasoning
- Axis 7: reasoning depth — thinking tokens + self-correction markers.
- recommendations
- Prescriptive fix recommendations derived from a
DiffReport. - report
- Rendering of
DiffReportto markdown and terminal. - safety
- Axis 3: safety — the rate at which the model abstained from completing the user’s request.
- semantic
- Axis 1: final-output semantic similarity.
- trajectory
- Axis 2: tool-call trajectory divergence.
- verbosity
- Axis 4: verbosity (output-token count) from
chat_response.usage.output_tokens.
Functions§
- compute_
report - Compute a
DiffReportfrom a baseline and candidate trace. - extract_
response_ pairs - Extract (baseline_response, candidate_response) pairs by pairing the
i-th
chat_responseinbaselinewith the i-th incandidate.