pub struct EvalCase {Show 19 fields
pub id: String,
pub name: String,
pub description: Option<String>,
pub system_prompt: String,
pub user_messages: Vec<String>,
pub expected_trajectory: Option<Vec<ExpectedToolCall>>,
pub expected_response: Option<ResponseCriteria>,
pub expected_assertion: Option<Assertion>,
pub expected_interactions: Option<Vec<InteractionExpectation>>,
pub few_shot_examples: Vec<FewShotExample>,
pub budget: Option<BudgetConstraints>,
pub evaluators: Vec<String>,
pub metadata: Value,
pub attachments: Vec<Attachment>,
pub session_id: Option<Uuid>,
pub expected_environment_state: Option<Vec<EnvironmentState>>,
pub expected_tool_intent: Option<ToolIntent>,
pub semantic_tool_selection: bool,
pub state_capture: Option<StateCapture>,
}Expand description
A single evaluation scenario.
Defines the agent prompt, expected outcomes, and which evaluators to run.
Fields§
§id: StringUnique identifier for this case.
name: StringHuman-readable name.
description: Option<String>Optional description of what this case tests.
system_prompt: StringSystem prompt for the agent.
user_messages: Vec<String>Initial user messages (the prompt).
expected_trajectory: Option<Vec<ExpectedToolCall>>Expected tool call trajectory (golden path).
expected_response: Option<ResponseCriteria>Expected final response criteria.
expected_assertion: Option<Assertion>Judge-evaluated assertion expected to hold after the run.
expected_interactions: Option<Vec<InteractionExpectation>>Expected interactions or hand-offs within the run.
few_shot_examples: Vec<FewShotExample>Prompt examples injected ahead of judge-backed evaluations.
budget: Option<BudgetConstraints>Cost/budget governance constraints.
evaluators: Vec<String>Names of evaluators to run. Empty means all registered evaluators.
metadata: ValueArbitrary metadata for user-defined extensions and filtering.
attachments: Vec<Attachment>Multimodal data references consumed by multimodal evaluators.
session_id: Option<Uuid>Stable case/session identifier. When absent, callers may derive one
deterministically via Self::default_session_id.
expected_environment_state: Option<Vec<EnvironmentState>>Expected environment-state snapshots keyed by name (FR-013).
Compared against the output of state_capture via full JSON equality.
Duplicate names are rejected at case-load time (FR-015, SC-009).
expected_tool_intent: Option<ToolIntent>Expected semantic tool intent for the tool-parameter evaluator (FR-012).
semantic_tool_selection: boolEnable semantic tool-selection scoring for this case (FR-011).
state_capture: Option<StateCapture>Callback that produces the actual environment state after the agent
completes. Programmatic only — mirrors ResponseCriteria::Custom.
Implementations§
Source§impl EvalCase
impl EvalCase
Sourcepub fn content_fingerprint(&self) -> CaseFingerprint
pub fn content_fingerprint(&self) -> CaseFingerprint
Canonical serializable projection used by deterministic ID and cache-key derivation.
Sourcepub fn default_session_id(&self) -> Uuid
pub fn default_session_id(&self) -> Uuid
Deterministically derive the default session ID for this case.
Programmatic-only closures such as state_capture and
ResponseCriteria::Custom bodies are never serialized directly.
Instead, this hashes a stable canonical fingerprint that preserves the
presence of custom criteria while avoiding pointer-address instability.