Module benchmark

Expand description

Benchmark event protocol for Smith agent optimization

This module defines the complete event schema for collecting agent performance data to optimize Smith’s performance on coding benchmarks like SWE-bench.

Structs§

BenchmarkEvent: Core benchmark event with required tracking fields
ConfigSuggestion: Optimizer configuration suggestions
ErrorState: Final error state classification
EvidenceFootprint: Evidence of tool’s impact on the codebase
FailureAnalysis: Structured failure analysis for learning
PruningDecision: Context pruning decisions for optimization
RecoveryAttempt: Recovery attempt during failure handling
RetryPolicy: Retry policy configuration
RunConfig: Configuration for a benchmark run
RunResult: Results from a completed benchmark run
SandboxLimits: Sandbox resource limits
StepData: Individual reasoning/action step data
TaskFeatures: Task features for contextual optimization
ToolPerformance: Tool performance metrics for optimization

Enums§

BenchmarkEventType: All benchmark event types for agent optimization
ExitKind
FailureRoot
SegmentType
StepType

Module benchmark

Module benchmark Copy item path

Structs§

Enums§

Module benchmark