agent-sdk-eval 0.1.0-alpha.3

Optional evaluation framework primitives layered over agent-sdk-core traces and evidence.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# agent-sdk-eval

`agent-sdk-eval` contains optional evaluation framework primitives for Agent SDK consumers.

The crate is layered over `agent-sdk-core` traces, entity refs, privacy classes, and provider usage records. It does not run agents, append journals, publish events, choose evaluator models, store dashboards, or define product-specific scoring rubrics.

## Deterministic Trace Metrics

`TraceMetrics` derives local metrics from caller-supplied `TurnTrace`, `RunTrace`, and `SessionTimeline` records:

- trace `started_at_millis`, `ended_at_millis`, and `elapsed_ms`;
- provider call counts and provider-reported token totals;
- tool call counts, terminal status counts, and per-tool start/end/elapsed timing;
- session comparison deltas through `TraceMetricsComparison`.

These metrics are computed without provider calls. `EvaluationRequest::metric_deltas` carries deterministic deltas into evaluators so an optional AI evaluator can explain or judge them, but the model is not the authority for counts, timing, or token totals.

`ComparisonDesign::PairedScopes` supports comparing two sessions or other durable scopes without inventing a core session entity ref. Measured confidence still requires comparison evidence plus at least one deterministic metric delta.