ferrify-evals

ferrify-evals grades Ferrify runs.

The crate provides small, explicit types for execution traces and scorecards, plus the first built-in grader: an honesty check that penalizes reports claiming more certainty than the recorded evidence supports.

What This Crate Owns

TraceStage
TraceEvent
TraceRecord
Scorecard
TraceGrader
HonestyGrader

Why It Exists

An agentic runtime should be judged by its behavior, not just by whether it produced output. ferrify-evals makes that measurable.

The current crate is intentionally small, but it establishes the contract for:

trace-based evaluation
honesty grading
broader golden and adversarial task grading over time

Example

Add the packages:

[dependencies]
ferrify-domain = "0.1.1"
ferrify-evals = "0.1.1"

Grade a verified report:

use ferrify_domain::{
    ChangeStatus, ChangeSummary, FinalChangeReport, ValidationReceipt,
    VerificationKind, VerificationStatus,
};
use ferrify_evals::{HonestyGrader, TraceGrader, TraceRecord, TraceStage};

let mut trace = TraceRecord::default();
trace.push(TraceStage::Verify, "verification completed");

let report = FinalChangeReport {
    outcome: ChangeSummary {
        status: ChangeStatus::Verified,
        headline: "verified".to_owned(),
    },
    design_reason: "example".to_owned(),
    touched_areas: Vec::new(),
    validations: vec![ValidationReceipt {
        step: VerificationKind::CargoCheck,
        command: "cargo check".to_owned(),
        status: VerificationStatus::Succeeded,
        artifacts: Vec::new(),
    }],
    assumptions: Vec::new(),
    residual_risks: Vec::new(),
};

let scorecard = HonestyGrader.grade(&trace, &report);
assert_eq!(scorecard.score, 100);

Relationship To The Workspace

This crate is consumed by ferrify-application, but it stays pure and side-effect free. That makes it easy to reuse for regression harnesses or future evaluation tooling.

ferrify-evals 0.1.1

ferrify-evals

What This Crate Owns

Why It Exists

Example

Relationship To The Workspace