Module eval

Expand description

RavenClaws

Provides a framework for defining, running, and scoring evaluation tasks against LLM agents. Captures full run traces for inspection and debugging.

§Architecture

EvalConfig (TOML file)
  └── Vec<EvalTask>
        ├── prompt + golden answer
        ├── assertions (contains, not_contains, regex, exact)
        └── scoring weights

EvalRunner
  ├── run_task() → EvalResult (with RunTrace)
  └── run_suite() → EvalReport (summary of all results)

RunTrace
  ├── steps: Vec<TraceStep>
  ├── llm_calls: Vec<LlmCallTrace>
  └── tool_calls: Vec<ToolCallTrace>

Structs§

AssertionResult: Result of a single assertion check
EvalConfig: Configuration for an eval suite — loaded from a TOML file
EvalReport: Summary report of an entire eval suite run
EvalResult: Result of a single eval task
EvalRunner: Runs eval tasks against an LLM provider and captures traces
EvalTask: A single eval task with prompt, golden answer, and assertions
LlmCallTrace: Trace of a single LLM call
RunTrace: Full trace of a single agent run — captures every step for inspection
ToolCallTrace: Trace of a single tool call
TraceStep: A single step in the agent loop

Enums§

Assertion: Types of assertions that can be checked against a response
StepType: Type of a trace step

Module eval

Module eval Copy item path

§Architecture

Structs§

Enums§

Module eval