traitclaw-eval
Evaluation framework for TraitClaw — test suites, metrics, and quality reports for AI agents.
Measure agent quality with structured test cases and pluggable metrics. Includes built-in keyword matching and length-relevancy scoring. Run evaluations deterministically without hitting LLM APIs.
Usage
use ;
// Define test cases
let suite = new
.add_case
.add_case;
// Score with built-in metrics
let keyword_score = KeywordMetric.score;
// → 1.0 (both keywords found)
let length_score = LengthRelevancyMetric.score;
// → 0.0..1.0 (penalizes too-short or too-long responses)
Components
| Component | Purpose |
|---|---|
EvalSuite |
Container for test cases |
TestCase |
Input prompt + expected keywords/output |
Metric (trait) |
Pluggable scoring function (0.0 → 1.0) |
KeywordMetric |
Fraction of expected keywords found in output |
LengthRelevancyMetric |
Penalizes responses outside 2-10x input length |
EvalReport |
Summary with pass/fail counts and average scores |
TestResult |
Per-test scores and pass/fail status |
Custom Metrics
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.