pub struct EvalRunner { /* private fields */ }Expand description
Evaluation runner — executes a suite against an agent using async metrics.
Implementations§
Source§impl EvalRunner
impl EvalRunner
Sourcepub fn metric(self, metric: Box<dyn AsyncMetric>) -> Self
pub fn metric(self, metric: Box<dyn AsyncMetric>) -> Self
Add a metric to score agent outputs with.
Sourcepub fn threshold(self, threshold: f64) -> Self
pub fn threshold(self, threshold: f64) -> Self
Set the minimum score threshold for a test case to pass.
A case passes if all metric scores are ≥ threshold.
Sourcepub async fn run(
&self,
agent: &dyn EvalAgent,
suite: &EvalSuite,
) -> Result<EvalReport>
pub async fn run( &self, agent: &dyn EvalAgent, suite: &EvalSuite, ) -> Result<EvalReport>
Execute the evaluation suite against the agent.
For each test case: calls agent.respond(input), scores with all metrics,
marks passed/failed, and aggregates into an EvalReport.
§Errors
Returns an error if the agent fails on any test case.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for EvalRunner
impl !RefUnwindSafe for EvalRunner
impl Send for EvalRunner
impl Sync for EvalRunner
impl Unpin for EvalRunner
impl UnsafeUnpin for EvalRunner
impl !UnwindSafe for EvalRunner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more