Expand description
Optional evaluation framework primitives for Agent SDK runs.
This crate owns post-hoc evaluation contracts over core traces and evidence. It does not run agents, append journals, publish events, choose evaluator models, or define product-specific success rubrics.
Re-exports§
pub use comparison::ComparisonDesign;pub use evaluator::Evaluator;pub use evidence::EvidenceBundle;pub use evidence::EvidenceItem;pub use evidence::EvidenceRole;pub use evidence::SupportRefValidation;pub use identity::EvaluationId;pub use metrics::ToolTraceMetric;pub use metrics::TraceMetrics;pub use metrics::TraceMetricsComparison;pub use report::EvaluationConfidence;pub use report::EvaluationMetricDelta;pub use report::EvaluationReport;pub use report::EvaluationVerdict;pub use report::EvaluatorJudgment;pub use request::EvaluationBudget;pub use request::EvaluationRequest;pub use request::EvaluationUsage;pub use scope::EvaluationCriterion;pub use scope::EvaluationScope;pub use scope::EvaluationSubject;pub use scope::EvaluationSubjectRole;pub use scope::ExpectedOutcome;
Modules§
- comparison
- Comparison designs for evaluation reports.
- evaluator
- Evaluator trait for post-hoc evaluation implementations.
- evidence
- Evidence bundles derived from core traces.
- identity
- Stable identifiers for evaluation framework records.
- metrics
- Deterministic metrics derived from core traces and journal records.
- report
- Evaluation report records and confidence validation.
- request
- Evaluation request, budget, and usage records.
- scope
- Evaluation scopes, subjects, and expected outcomes.
- testing
- Deterministic evaluator fakes for SDK consumers.