ferrify-evals
ferrify-evals grades Ferrify runs.
The crate provides small, explicit types for execution traces and scorecards, plus the first built-in grader: an honesty check that penalizes reports claiming more certainty than the recorded evidence supports.
What This Crate Owns
TraceStageTraceEventTraceRecordScorecardTraceGraderHonestyGrader
Why It Exists
An agentic runtime should be judged by its behavior, not just by whether it
produced output. ferrify-evals makes that measurable.
The current crate is intentionally small, but it establishes the contract for:
- trace-based evaluation
- honesty grading
- broader golden and adversarial task grading over time
Example
Add the packages:
[]
= "0.1.1"
= "0.1.1"
Grade a verified report:
use ;
use ;
let mut trace = default;
trace.push;
let report = FinalChangeReport ;
let scorecard = HonestyGrader.grade;
assert_eq!;
Relationship To The Workspace
This crate is consumed by ferrify-application, but it stays pure and
side-effect free. That makes it easy to reuse for regression harnesses or
future evaluation tooling.