Expand description
prompt-eval-rubric: score LLM outputs against named 0.0–1.0 rubrics.
Validators reject; rubrics rank. Use this when you want to know how good an output is on each axis, not just whether it passes a binary gate.
use prompt_eval_rubric::{Rubric, RubricSet};
use serde_json::json;
let length_ok = Rubric::new("length", |out, _ctx| {
let n = out.len();
if n >= 10 && n <= 200 { 1.0 } else { 0.0 }
});
let has_json = Rubric::new("has_json", |out, _ctx| {
if serde_json::from_str::<serde_json::Value>(out).is_ok() { 1.0 } else { 0.0 }
});
let set = RubricSet::new(vec![(length_ok, 0.4), (has_json, 0.6)]).unwrap();
let report = set.evaluate(r#"{"key": "value"}"#, None);
assert!(report.overall > 0.5);
println!("overall: {}", report.overall);
for s in &report.scores {
println!("{}: {:.2} {:?}", s.name, s.value, s.reason);
}Structs§
- Report
- Aggregate result from
RubricSet::evaluate. - Rubric
- One scoring axis, wrapping a callable
(output, context) -> score. - Rubric
Set - Aggregate multiple rubrics with optional weights.
- Score
- One rubric’s score for one output.