pub struct ApiJudgeProvider { /* private fields */ }Expand description
A judge-only Provider that scores evals and plays the simulated user with
a direct model API call (Anthropic or OpenAI), rather than running them
through a harness.
Why this exists: routing the judge through a full agentic harness pays an
agent-loop cold start on every short verdict. A direct API call is one HTTP
round trip — faster and cheaper on API-key auth — and still reuses the exact
same judge/user prompts and tolerant verdict parsing as
OneharnessProvider, so the two are directly comparable.
It does not run skills: respond returns an error. Compose it with a
skill-running provider via SplitProvider so the harness under test still
drives respond, while the judge runs on the API.
The request is sent with curl (Rust has no official vendor SDK). The API
key is read from an env var and passed through a private (0600) curl
config file, so it never appears in argv / ps.
Implementations§
Source§impl ApiJudgeProvider
impl ApiJudgeProvider
Sourcepub fn new(config: &ApiJudgeConfig) -> Self
pub fn new(config: &ApiJudgeConfig) -> Self
Build a provider from its configuration, resolving per-vendor defaults for the API-key env var and endpoint.
Trait Implementations§
Source§impl Provider for ApiJudgeProvider
impl Provider for ApiJudgeProvider
Source§fn respond(
&self,
_platform: &str,
_model: &str,
_skill: &SkillRef<'_>,
_messages: &[Message],
_session: Option<&str>,
) -> Result<AssistantTurn>
fn respond( &self, _platform: &str, _model: &str, _skill: &SkillRef<'_>, _messages: &[Message], _session: Option<&str>, ) -> Result<AssistantTurn>
session,
when Some, is a handle returned by a previous respond call on this
run that the provider may use to continue the same harness session
(e.g. via oneharness run --resume); providers that don’t support
continuation should ignore it. Read moreSource§fn simulate_user(
&self,
model: &str,
persona: &str,
messages: &[Message],
) -> Result<UserTurn>
fn simulate_user( &self, model: &str, persona: &str, messages: &[Message], ) -> Result<UserTurn>
Source§fn judge(
&self,
model: &str,
query: &JudgeQuery<'_>,
messages: &[Message],
) -> Result<JudgeVerdict>
fn judge( &self, model: &str, query: &JudgeQuery<'_>, messages: &[Message], ) -> Result<JudgeVerdict>
Source§fn supports_resume(&self, _platform: &str) -> bool
fn supports_resume(&self, _platform: &str) -> bool
respond on platform will faithfully continue a prior
session when given its session_id. The default is false; providers
that support resume override this so the runner knows to thread the
session id through.