traitclaw-eval

Evaluation framework for TraitClaw — test suites, metrics, and quality reports for AI agents.

Measure agent quality with structured test cases and pluggable metrics. Includes built-in keyword matching and length-relevancy scoring. Run evaluations deterministically without hitting LLM APIs.

Usage

use traitclaw_eval::{EvalSuite, TestCase, KeywordMetric, LengthRelevancyMetric, Metric};

// Define test cases
let suite = EvalSuite::new("quality_tests")
    .add_case(
        TestCase::new("greeting", "Say hello")
            .expect_contains("hello")
            .expect_contains("help"),
    )
    .add_case(
        TestCase::new("math", "What is 2+2?")
            .expect_contains("4"),
    );

// Score with built-in metrics
let keyword_score = KeywordMetric.score("input", "Hello! How can I help?", &["hello", "help"]);
// → 1.0 (both keywords found)

let length_score = LengthRelevancyMetric.score("input", "response text", &[]);
// → 0.0..1.0 (penalizes too-short or too-long responses)

Components

Component	Purpose
`EvalSuite`	Container for test cases
`TestCase`	Input prompt + expected keywords/output
`Metric` (trait)	Pluggable scoring function (0.0 → 1.0)
`KeywordMetric`	Fraction of expected keywords found in output
`LengthRelevancyMetric`	Penalizes responses outside 2-10x input length
`EvalReport`	Summary with pass/fail counts and average scores
`TestResult`	Per-test scores and pass/fail status

Custom Metrics

impl Metric for MyMetric {
    fn name(&self) -> &'static str { "my_metric" }
    fn score(&self, input: &str, output: &str, keywords: &[&str]) -> f64 {
        // Return 0.0 (worst) to 1.0 (best)
    }
}

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.

traitclaw-eval 1.0.0

traitclaw-eval

Usage

Components

Custom Metrics

License