Skip to main content

ApiJudgeProvider

Struct ApiJudgeProvider 

Source
pub struct ApiJudgeProvider { /* private fields */ }
Expand description

A judge-only Provider that scores evals and plays the simulated user with a direct model API call (Anthropic or OpenAI), rather than running them through a harness.

Why this exists: routing the judge through a full agentic harness pays an agent-loop cold start on every short verdict. A direct API call is one HTTP round trip — faster and cheaper on API-key auth — and still reuses the exact same judge/user prompts and tolerant verdict parsing as OneharnessProvider, so the two are directly comparable.

It does not run skills: respond returns an error. Compose it with a skill-running provider via SplitProvider so the harness under test still drives respond, while the judge runs on the API.

The request is sent with curl (Rust has no official vendor SDK). The API key is read from an env var and passed through a private (0600) curl config file, so it never appears in argv / ps.

Implementations§

Source§

impl ApiJudgeProvider

Source

pub fn new(config: &ApiJudgeConfig) -> Self

Build a provider from its configuration, resolving per-vendor defaults for the API-key env var and endpoint.

Trait Implementations§

Source§

impl Provider for ApiJudgeProvider

Source§

fn respond( &self, _platform: &str, _model: &str, _skill: &SkillRef<'_>, _messages: &[Message], _session: Option<&str>, ) -> Result<AssistantTurn>

Run one assistant/skill turn given the conversation so far. session, when Some, is a handle returned by a previous respond call on this run that the provider may use to continue the same harness session (e.g. via oneharness run --resume); providers that don’t support continuation should ignore it. Read more
Source§

fn simulate_user( &self, model: &str, persona: &str, messages: &[Message], ) -> Result<UserTurn>

Produce one simulated-user turn. Read more
Source§

fn judge( &self, model: &str, query: &JudgeQuery<'_>, messages: &[Message], ) -> Result<JudgeVerdict>

Score a criterion against the conversation. Read more
Source§

fn supports_resume(&self, _platform: &str) -> bool

True iff respond on platform will faithfully continue a prior session when given its session_id. The default is false; providers that support resume override this so the runner knows to thread the session id through.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.