agent-sdk-eval
agent-sdk-eval contains optional evaluation framework primitives for Agent SDK consumers.
The crate is layered over agent-sdk-core traces, entity refs, privacy classes, and provider usage records. It does not run agents, append journals, publish events, choose evaluator models, store dashboards, or define product-specific scoring rubrics.
Deterministic Trace Metrics
TraceMetrics derives local metrics from caller-supplied TurnTrace, RunTrace, and SessionTimeline records:
- trace
started_at_millis,ended_at_millis, andelapsed_ms; - provider call counts and provider-reported token totals;
- tool call counts, terminal status counts, and per-tool start/end/elapsed timing;
- session comparison deltas through
TraceMetricsComparison.
These metrics are computed without provider calls. EvaluationRequest::metric_deltas carries deterministic deltas into evaluators so an optional AI evaluator can explain or judge them, but the model is not the authority for counts, timing, or token totals.
ComparisonDesign::PairedScopes supports comparing two sessions or other durable scopes without inventing a core session entity ref. Measured confidence still requires comparison evidence plus at least one deterministic metric delta.