serdes-ai-evals 0.1.5

Evaluation framework for testing and benchmarking serdes-ai agents
Documentation

serdes-ai-evals

Crates.io Documentation License: MIT

Evaluation framework for testing and benchmarking serdes-ai agents

This crate provides evaluation and testing capabilities for SerdesAI:

  • Test case definitions
  • Evaluation metrics (accuracy, latency, cost)
  • Benchmark harness
  • Regression testing
  • LLM-as-judge evaluators

Installation

[dependencies]
serdes-ai-evals = "0.1"

Usage

use serdes_ai_evals::{EvalSuite, TestCase, Evaluator};

let suite = EvalSuite::new("my-agent-tests")
    .case(TestCase::new("greeting")
        .input("Hello!")
        .expected_contains("Hello"))
    .case(TestCase::new("math")
        .input("What is 2+2?")
        .expected_contains("4"));

let results = suite.run(&agent).await?;
println!("Pass rate: {:.1}%", results.pass_rate() * 100.0);

Part of SerdesAI

This crate is part of the SerdesAI workspace.

For most use cases, you should use the main serdes-ai crate which re-exports these types.

License

MIT License - see LICENSE for details.