Struct ApiJudgeProvider

Source

pub struct ApiJudgeProvider { /* private fields */ }

Expand description

A judge-only Provider that scores evals and plays the simulated user with a direct model API call (Anthropic or OpenAI), rather than running them through a harness.

Why this exists: routing the judge through a full agentic harness pays an agent-loop cold start on every short verdict. A direct API call is one HTTP round trip — faster and cheaper on API-key auth — and still reuses the exact same judge/user prompts and tolerant verdict parsing as OneharnessProvider, so the two are directly comparable.

It does not run skills: respond returns an error. Compose it with a skill-running provider via SplitProvider so the harness under test still drives respond, while the judge runs on the API.

The request is sent with curl (Rust has no official vendor SDK). The API key is read from an env var and passed through a private (0600) curl config file, so it never appears in argv / ps.

ApiJudgeProvider

Struct ApiJudgeProvider Copy item path

Implementations§

impl ApiJudgeProvider

pub fn new(config: &ApiJudgeConfig) -> Self

Trait Implementations§

impl Provider for ApiJudgeProvider

fn respond( &self, _platform: &str, _model: &str, _skill: &SkillRef<'_>, _messages: &[Message], _session: Option<&str>, ) -> Result<AssistantTurn>

fn simulate_user( &self, model: &str, persona: &str, messages: &[Message], ) -> Result<UserTurn>

fn judge( &self, model: &str, query: &JudgeQuery<'_>, messages: &[Message], ) -> Result<JudgeVerdict>

fn respond_streaming( &self, platform: &str, model: &str, skill: &SkillRef<'_>, messages: &[Message], session: Option<&str>, on_event: &mut dyn FnMut(&ToolEvent) -> ControlFlow<()>, ) -> Result<AssistantTurn>

fn respond_with_mocks( &self, platform: &str, model: &str, skill: &SkillRef<'_>, messages: &[Message], session: Option<&str>, mocks: Option<&MockPlan<'_>>, ) -> Result<AssistantTurn>

fn respond_streaming_with_mocks( &self, platform: &str, model: &str, skill: &SkillRef<'_>, messages: &[Message], session: Option<&str>, mocks: Option<&MockPlan<'_>>, on_event: &mut dyn FnMut(&ToolEvent) -> ControlFlow<()>, ) -> Result<AssistantTurn>

fn supports_resume(&self, _platform: &str) -> bool

Auto Trait Implementations§

impl Freeze for ApiJudgeProvider

impl RefUnwindSafe for ApiJudgeProvider

impl Send for ApiJudgeProvider

impl Sync for ApiJudgeProvider

impl Unpin for ApiJudgeProvider

impl UnsafeUnpin for ApiJudgeProvider

impl UnwindSafe for ApiJudgeProvider

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct ApiJudgeProvider

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,