Struct LlmJudgeEvaluation

Source

pub struct LlmJudgeEvaluation {
    pub judge_config: AgentLoopConfig,
    pub system_prompt: Option<String>,
}

Expand description

Uses a separate LLM call to judge which branch response is best.

§Judge prompt construction

The judge sees only clean, relevant content — never raw tool calls or intermediate steps from inside a branch:

Prior conversation context (when present): the conversation history before the user query, formatted as a human-readable transcript. Only Content::Text survives — tool call arguments and images are stripped. Omitted when empty.
Original query: text extracted from user messages in prompts (agent_loop mode), or from the last Message::User in context.messages[..original_context_len] (agent_loop_continue mode).
Per-branch response: the final assistant text from the last Message::Assistant in outcome.new_messages. Tool calls, tool results, and all multi-turn exchanges within a branch are stripped. The judge evaluates outcomes, not the reasoning trace.

§`agent_loop_continue` mode

When prompts is empty (continue mode), the judge locates the last Message::User in context.messages[..original_context_len] as the query. Everything before that message becomes the prior conversation context.

§Judge’s comprehension criteria

All N branch final responses (plus prior context) must fit in the judge model’s context window simultaneously for a fair comparison. The token budget is derived from judge_config.context_config.max_context_tokens (if set). When no context limit is configured, all content is passed through as-is.

§2-iteration compaction strategy

When combined content exceeds the budget, compaction is applied in two iterations:

Iteration 1 — compact prior context only, outputs intact. The prior context is reduced through 3 progressive tiers while branch outputs are preserved verbatim:

Tier 1: keep only the last 80 lines.
Tier 2: keep first paragraph + last paragraph only.
Tier 3: hard char limit derived from remaining budget.

Iteration 2 — compact both independently (if iteration 1 insufficient). Context stays at tier-3 form; branch outputs are now compacted independently through the same 3-tier pipeline.

A AgentEvent::ProgressMessage warning is emitted to tx if the budget cannot be satisfied after both iterations.

The judge’s decision applies to the original (uncompacted) branch responses. ParallelLoopResult::selected_messages always contains the uncompacted winner.

§Response parsing

The judge’s reply is scanned for the first numeric token (e.g., “1”, “2”, “Response 2”). Falls back to index 0 if no number is found or parsing fails.

§Session traceability

The judge loop inherits the session_id from the branches so all events (including the judge’s AgentStart) are visible in the same session trace.

Fields§

§judge_config: AgentLoopConfig

Config for the judge LLM call. Set context_config.max_context_tokens to enable the comprehension-criteria compaction check.

§system_prompt: Option<String>

Optional system prompt override. When None, a built-in evaluation prompt is used.

Struct LlmJudgeEvaluation Copy item path

§Judge prompt construction

§agent_loop_continue mode

§Judge’s comprehension criteria

§2-iteration compaction strategy

§Response parsing

§Session traceability

Fields§

Trait Implementations§

impl EvaluationStrategy for LlmJudgeEvaluation

Auto Trait Implementations§

impl Freeze for LlmJudgeEvaluation

impl !RefUnwindSafe for LlmJudgeEvaluation

impl Send for LlmJudgeEvaluation

impl Sync for LlmJudgeEvaluation

impl Unpin for LlmJudgeEvaluation

impl UnsafeUnpin for LlmJudgeEvaluation

impl !UnwindSafe for LlmJudgeEvaluation

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

Struct LlmJudgeEvaluation

§`agent_loop_continue` mode

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,