Struct EvalCase

Source

pub struct EvalCase {Show 19 fields
    pub id: String,
    pub name: String,
    pub description: Option<String>,
    pub system_prompt: String,
    pub user_messages: Vec<String>,
    pub expected_trajectory: Option<Vec<ExpectedToolCall>>,
    pub expected_response: Option<ResponseCriteria>,
    pub expected_assertion: Option<Assertion>,
    pub expected_interactions: Option<Vec<InteractionExpectation>>,
    pub few_shot_examples: Vec<FewShotExample>,
    pub budget: Option<BudgetConstraints>,
    pub evaluators: Vec<String>,
    pub metadata: Value,
    pub attachments: Vec<Attachment>,
    pub session_id: Option<Uuid>,
    pub expected_environment_state: Option<Vec<EnvironmentState>>,
    pub expected_tool_intent: Option<ToolIntent>,
    pub semantic_tool_selection: bool,
    pub state_capture: Option<StateCapture>,
}

Expand description

A single evaluation scenario.

Defines the agent prompt, expected outcomes, and which evaluators to run.

Fields§

§id: String

Unique identifier for this case.

§name: String

Human-readable name.

§description: Option<String>

Optional description of what this case tests.

§system_prompt: String

System prompt for the agent.

§user_messages: Vec<String>

Initial user messages (the prompt).

§expected_trajectory: Option<Vec<ExpectedToolCall>>

Expected tool call trajectory (golden path).

§expected_response: Option<ResponseCriteria>

Expected final response criteria.

§expected_assertion: Option<Assertion>

Judge-evaluated assertion expected to hold after the run.

§expected_interactions: Option<Vec<InteractionExpectation>>

Expected interactions or hand-offs within the run.

§few_shot_examples: Vec<FewShotExample>

Prompt examples injected ahead of judge-backed evaluations.

§budget: Option<BudgetConstraints>

Cost/budget governance constraints.

§evaluators: Vec<String>

Names of evaluators to run. Empty means all registered evaluators.

§metadata: Value

Arbitrary metadata for user-defined extensions and filtering.

§attachments: Vec<Attachment>

Multimodal data references consumed by multimodal evaluators.

§session_id: Option<Uuid>

Stable case/session identifier. When absent, callers may derive one deterministically via Self::default_session_id.

§expected_environment_state: Option<Vec<EnvironmentState>>

Expected environment-state snapshots keyed by name (FR-013).

Compared against the output of state_capture via full JSON equality. Duplicate names are rejected at case-load time (FR-015, SC-009).

§expected_tool_intent: Option<ToolIntent>

Expected semantic tool intent for the tool-parameter evaluator (FR-012).

§semantic_tool_selection: bool

Enable semantic tool-selection scoring for this case (FR-011).

§state_capture: Option<StateCapture>

Callback that produces the actual environment state after the agent completes. Programmatic only — mirrors ResponseCriteria::Custom.

Implementations§

Source §

impl EvalCase

Source

pub fn content_fingerprint(&self) -> CaseFingerprint

Canonical serializable projection used by deterministic ID and cache-key derivation.

Source

pub fn default_session_id(&self) -> Uuid

Deterministically derive the default session ID for this case.

Programmatic-only closures such as state_capture and ResponseCriteria::Custom bodies are never serialized directly. Instead, this hashes a stable canonical fingerprint that preserves the presence of custom criteria while avoiding pointer-address instability.

Source