Expand description
Async LLM-as-judge evaluators for Nous.
These evaluators run asynchronously after agent runs complete. They use a separate model call to assess quality dimensions that require language understanding.
Re-exports§
pub use anthropic_judge::AnthropicJudgeProvider;pub use judge_provider::JudgeProvider;pub use judge_provider::MockJudgeProvider;pub use judge_provider::parse_judge_scores;pub use plan_adherence::PlanAdherence;pub use plan_quality::PlanQuality;pub use task_completion::TaskCompletion;
Modules§
- anthropic_
judge - Anthropic API-backed judge provider for real LLM-as-judge evaluation.
- judge_
provider - LLM call wrapper for evaluation.
- plan_
adherence - Plan adherence evaluator — did the agent follow its stated plan?
- plan_
quality - Plan quality evaluator — LLM-as-judge for reasoning coherence.
- task_
completion - Task completion evaluator — did the agent achieve its goal?