pub struct ReliabilityReport {
pub metric: String,
pub threshold: f64,
pub k: usize,
pub n_queries: usize,
pub trials_per_query: usize,
pub mean_pass_rate: f64,
pub pass_at_k: f64,
pub pass_all_k: f64,
pub per_query: Vec<QueryReliability>,
}Expand description
Repeated-trial reliability report for a single metric.
This report turns a set of repeated MetricReports for the same metric
into reliability estimates. Scores are thresholded into pass/fail outcomes
first, then pass@k and pass^k are estimated per query and averaged.
use rig_retrieval_evals::{MetricReport, ReliabilityReport};
let trial_a = MetricReport::from_per_query(
"recall@10".into(),
vec![("q1".into(), 1.0), ("q2".into(), 0.0)],
);
let trial_b = MetricReport::from_per_query(
"recall@10".into(),
vec![("q1".into(), 1.0), ("q2".into(), 1.0)],
);
let reliability = ReliabilityReport::from_metric_reports(
"recall@10",
1.0,
2,
&[trial_a, trial_b],
)?;
assert_eq!(reliability.n_queries, 2);
assert_eq!(reliability.trials_per_query, 2);Fields§
§metric: StringMetric identifier shared by every trial report.
threshold: f64Score threshold used to convert each trial into pass/fail.
k: usizeNumber of attempts sampled in pass@k / pass^k estimates.
n_queries: usizeNumber of queries included in the reliability estimate.
trials_per_query: usizeNumber of trials observed for each query.
mean_pass_rate: f64Mean per-query pass rate.
pass_at_k: f64Mean per-query pass@k.
pass_all_k: f64Mean per-query pass^k.
per_query: Vec<QueryReliability>Per-query reliability rows, in the first trial report’s query order.
Implementations§
Source§impl ReliabilityReport
impl ReliabilityReport
Sourcepub fn from_metric_reports(
metric: impl Into<String>,
threshold: f64,
k: usize,
reports: &[MetricReport],
) -> Result<Self>
pub fn from_metric_reports( metric: impl Into<String>, threshold: f64, k: usize, reports: &[MetricReport], ) -> Result<Self>
Build a repeated-trial reliability report from multiple
MetricReports for the same metric.
Every report must contain the same query ids exactly once. k must be
in 1..=reports.len(). Scores must be finite.
Trait Implementations§
Source§impl Clone for ReliabilityReport
impl Clone for ReliabilityReport
Source§fn clone(&self) -> ReliabilityReport
fn clone(&self) -> ReliabilityReport
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more