Skip to main content

Module eval

Module eval 

Source
Expand description

RavenClaws

Provides a framework for defining, running, and scoring evaluation tasks against LLM agents. Captures full run traces for inspection and debugging.

§Architecture

EvalConfig (TOML file)
  └── Vec<EvalTask>
        ├── prompt + golden answer
        ├── assertions (contains, not_contains, regex, exact)
        └── scoring weights

EvalRunner
  ├── run_task() → EvalResult (with RunTrace)
  └── run_suite() → EvalReport (summary of all results)

RunTrace
  ├── steps: Vec<TraceStep>
  ├── llm_calls: Vec<LlmCallTrace>
  └── tool_calls: Vec<ToolCallTrace>

Structs§

AssertionResult
Result of a single assertion check
EvalConfig
Configuration for an eval suite — loaded from a TOML file
EvalReport
Summary report of an entire eval suite run
EvalResult
Result of a single eval task
EvalRunner
Runs eval tasks against an LLM provider and captures traces
EvalTask
A single eval task with prompt, golden answer, and assertions
LlmCallTrace
Trace of a single LLM call
RunTrace
Full trace of a single agent run — captures every step for inspection
ToolCallTrace
Trace of a single tool call
TraceStep
A single step in the agent loop

Enums§

Assertion
Types of assertions that can be checked against a response
StepType
Type of a trace step