pub struct Aggregate {
pub total: usize,
pub mean_score: f64,
pub median_score: f64,
pub stddev: f64,
pub exact_match: usize,
pub error_count: usize,
pub total_elapsed_ms: u64,
}Expand description
Aggregate statistics computed from all ScenarioResults in a BenchRun.
Recomputed after every scenario via BenchRun::recompute_aggregate and persisted
into results.json so partial runs still contain meaningful statistics.
§Examples
use zeph_bench::Aggregate;
let agg = Aggregate {
total: 100,
mean_score: 0.72,
median_score: 0.70,
stddev: 0.15,
exact_match: 55,
error_count: 3,
total_elapsed_ms: 240_000,
};
assert_eq!(agg.total, 100);
assert_eq!(agg.error_count, 3);
assert!((agg.median_score - 0.70).abs() < f64::EPSILON);Fields§
§total: usizeNumber of scenarios included in the statistics.
mean_score: f64Arithmetic mean of all per-scenario scores.
median_score: f64Median per-scenario score.
For an even number of results, the median is the average of the two middle values.
Returns 0.0 when total == 0.
stddev: f64Population standard deviation of per-scenario scores (divide by N).
The scenario set is treated as the full population of interest, not a sample.
Returns 0.0 when total <= 1.
exact_match: usizeCount of scenarios where score >= 1.0 (exact match).
error_count: usizeCount of scenarios where score == 0.0 and error is Some(_).
A non-zero value indicates the agent failed to produce a response (e.g. timeout, LLM API error) rather than simply giving the wrong answer.
total_elapsed_ms: u64Sum of ScenarioResult::elapsed_ms across all scenarios.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for Aggregate
impl<'de> Deserialize<'de> for Aggregate
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for Aggregate
impl RefUnwindSafe for Aggregate
impl Send for Aggregate
impl Sync for Aggregate
impl Unpin for Aggregate
impl UnsafeUnpin for Aggregate
impl UnwindSafe for Aggregate
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request