traitclaw-eval 0.1.0

Evaluation framework for TraitClaw — test suites, metrics, and report generation
Documentation

traitclaw-eval

crates.io docs.rs

Evaluation framework for TraitClaw — test suites, metrics, and quality reports for AI agents.

Measure agent quality with structured test cases and pluggable metrics. Includes built-in keyword matching and length-relevancy scoring. Run evaluations deterministically without hitting LLM APIs.

Usage

use traitclaw_eval::{EvalSuite, TestCase, KeywordMetric, LengthRelevancyMetric, Metric};

// Define test cases
let suite = EvalSuite::new("quality_tests")
    .add_case(
        TestCase::new("greeting", "Say hello")
            .expect_contains("hello")
            .expect_contains("help"),
    )
    .add_case(
        TestCase::new("math", "What is 2+2?")
            .expect_contains("4"),
    );

// Score with built-in metrics
let keyword_score = KeywordMetric.score("input", "Hello! How can I help?", &["hello", "help"]);
// → 1.0 (both keywords found)

let length_score = LengthRelevancyMetric.score("input", "response text", &[]);
// → 0.0..1.0 (penalizes too-short or too-long responses)

Components

Component Purpose
EvalSuite Container for test cases
TestCase Input prompt + expected keywords/output
Metric (trait) Pluggable scoring function (0.0 → 1.0)
KeywordMetric Fraction of expected keywords found in output
LengthRelevancyMetric Penalizes responses outside 2-10x input length
EvalReport Summary with pass/fail counts and average scores
TestResult Per-test scores and pass/fail status

Custom Metrics

impl Metric for MyMetric {
    fn name(&self) -> &'static str { "my_metric" }
    fn score(&self, input: &str, output: &str, keywords: &[&str]) -> f64 {
        // Return 0.0 (worst) to 1.0 (best)
    }
}

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.