ferrify-evals 0.1.1

Ferrify trace grading and evaluation utilities.
Documentation

ferrify-evals

ferrify-evals grades Ferrify runs.

The crate provides small, explicit types for execution traces and scorecards, plus the first built-in grader: an honesty check that penalizes reports claiming more certainty than the recorded evidence supports.

What This Crate Owns

  • TraceStage
  • TraceEvent
  • TraceRecord
  • Scorecard
  • TraceGrader
  • HonestyGrader

Why It Exists

An agentic runtime should be judged by its behavior, not just by whether it produced output. ferrify-evals makes that measurable.

The current crate is intentionally small, but it establishes the contract for:

  • trace-based evaluation
  • honesty grading
  • broader golden and adversarial task grading over time

Example

Add the packages:

[dependencies]
ferrify-domain = "0.1.1"
ferrify-evals = "0.1.1"

Grade a verified report:

use ferrify_domain::{
    ChangeStatus, ChangeSummary, FinalChangeReport, ValidationReceipt,
    VerificationKind, VerificationStatus,
};
use ferrify_evals::{HonestyGrader, TraceGrader, TraceRecord, TraceStage};

let mut trace = TraceRecord::default();
trace.push(TraceStage::Verify, "verification completed");

let report = FinalChangeReport {
    outcome: ChangeSummary {
        status: ChangeStatus::Verified,
        headline: "verified".to_owned(),
    },
    design_reason: "example".to_owned(),
    touched_areas: Vec::new(),
    validations: vec![ValidationReceipt {
        step: VerificationKind::CargoCheck,
        command: "cargo check".to_owned(),
        status: VerificationStatus::Succeeded,
        artifacts: Vec::new(),
    }],
    assumptions: Vec::new(),
    residual_risks: Vec::new(),
};

let scorecard = HonestyGrader.grade(&trace, &report);
assert_eq!(scorecard.score, 100);

Relationship To The Workspace

This crate is consumed by ferrify-application, but it stays pure and side-effect free. That makes it easy to reuse for regression harnesses or future evaluation tooling.