Expand description
Optional evaluation framework primitives for Agent SDK runs.
This crate owns post-hoc evaluation contracts over core traces and evidence. It does not run agents, append journals, publish events, choose evaluator models, or define product-specific success rubrics.
Re-exports§
pub use comparison::ComparisonDesign;pub use cost::CostPolicy;pub use cost::CostReport;pub use cost::StaticRateTable;pub use evaluator::Evaluator;pub use evidence::EvidenceBundle;pub use evidence::EvidenceItem;pub use evidence::EvidenceRole;pub use evidence::SupportRefValidation;pub use identity::EvaluationId;pub use metrics::ToolTraceMetric;pub use metrics::TraceMetrics;pub use metrics::TraceMetricsComparison;pub use report::EvaluationConfidence;pub use report::EvaluationMetricDelta;pub use report::EvaluationReport;pub use report::EvaluationVerdict;pub use report::EvaluatorJudgment;pub use request::EvaluationBudget;pub use request::EvaluationRequest;pub use request::EvaluationUsage;pub use run_report::RunReport;pub use run_report::RunReportLimitations;pub use scope::EvaluationCriterion;pub use scope::EvaluationScope;pub use scope::EvaluationSubject;pub use scope::EvaluationSubjectRole;pub use scope::ExpectedOutcome;pub use usage::UsageReport;
Modules§
- comparison
- Comparison designs for evaluation reports.
- cost
- Cost report helpers over deterministic usage reports.
- evaluator
- Evaluator trait for post-hoc evaluation implementations.
- evidence
- Evidence bundles derived from core traces.
- identity
- Stable identifiers for evaluation framework records.
- metrics
- Deterministic metrics derived from core traces and journal records.
- report
- Evaluation report records and confidence validation.
- request
- Evaluation request, budget, and usage records.
- run_
report - Run-level report helpers.
- scope
- Evaluation scopes, subjects, and expected outcomes.
- testing
- Deterministic evaluator fakes for SDK consumers.
- usage
- Deterministic usage reports derived from trace metrics.