Skip to main content

Crate agent_sdk_eval

Crate agent_sdk_eval 

Source
Expand description

Optional evaluation framework primitives for Agent SDK runs.

This crate owns post-hoc evaluation contracts over core traces and evidence. It does not run agents, append journals, publish events, choose evaluator models, or define product-specific success rubrics.

Re-exports§

pub use comparison::ComparisonDesign;
pub use cost::CostPolicy;
pub use cost::CostReport;
pub use cost::StaticRateTable;
pub use evaluator::Evaluator;
pub use evidence::EvidenceBundle;
pub use evidence::EvidenceItem;
pub use evidence::EvidenceRole;
pub use evidence::SupportRefValidation;
pub use identity::EvaluationId;
pub use metrics::ToolTraceMetric;
pub use metrics::TraceMetrics;
pub use metrics::TraceMetricsComparison;
pub use report::EvaluationConfidence;
pub use report::EvaluationMetricDelta;
pub use report::EvaluationReport;
pub use report::EvaluationVerdict;
pub use report::EvaluatorJudgment;
pub use request::EvaluationBudget;
pub use request::EvaluationRequest;
pub use request::EvaluationUsage;
pub use run_report::RunReport;
pub use run_report::RunReportLimitations;
pub use scope::EvaluationCriterion;
pub use scope::EvaluationScope;
pub use scope::EvaluationSubject;
pub use scope::EvaluationSubjectRole;
pub use scope::ExpectedOutcome;
pub use usage::UsageReport;

Modules§

comparison
Comparison designs for evaluation reports.
cost
Cost report helpers over deterministic usage reports.
evaluator
Evaluator trait for post-hoc evaluation implementations.
evidence
Evidence bundles derived from core traces.
identity
Stable identifiers for evaluation framework records.
metrics
Deterministic metrics derived from core traces and journal records.
report
Evaluation report records and confidence validation.
request
Evaluation request, budget, and usage records.
run_report
Run-level report helpers.
scope
Evaluation scopes, subjects, and expected outcomes.
testing
Deterministic evaluator fakes for SDK consumers.
usage
Deterministic usage reports derived from trace metrics.