Expand description
RavenClaws
Provides a framework for defining, running, and scoring evaluation tasks against LLM agents. Captures full run traces for inspection and debugging.
§Architecture
EvalConfig (TOML file)
└── Vec<EvalTask>
├── prompt + golden answer
├── assertions (contains, not_contains, regex, exact)
└── scoring weights
EvalRunner
├── run_task() → EvalResult (with RunTrace)
└── run_suite() → EvalReport (summary of all results)
RunTrace
├── steps: Vec<TraceStep>
├── llm_calls: Vec<LlmCallTrace>
└── tool_calls: Vec<ToolCallTrace>Structs§
- Assertion
Result - Result of a single assertion check
- Eval
Config - Configuration for an eval suite — loaded from a TOML file
- Eval
Report - Summary report of an entire eval suite run
- Eval
Result - Result of a single eval task
- Eval
Runner - Runs eval tasks against an LLM provider and captures traces
- Eval
Task - A single eval task with prompt, golden answer, and assertions
- LlmCall
Trace - Trace of a single LLM call
- RunTrace
- Full trace of a single agent run — captures every step for inspection
- Tool
Call Trace - Trace of a single tool call
- Trace
Step - A single step in the agent loop