Module experiments

Structs§

BenchmarkCase: A single benchmark case.
BenchmarkSet: A set of benchmark cases loaded from a TOML file.
CaseScore: Score for a single benchmark case.
ConfigSnapshot: Snapshot of all tunable parameters for a single experiment arm.
EvalReport: Aggregate evaluation report returned by Evaluator::evaluate.
Evaluator: Evaluates a subject model against a benchmark dataset using an LLM judge.
ExperimentEngine: Autonomous parameter-tuning engine.
ExperimentResult
ExperimentSessionReport: Final report produced by ExperimentEngine::run.
GenerationOverrides: Partial LLM generation parameter overrides for experiment variation injection.
GridStep: Systematic grid sweep: iterate each parameter through its discrete steps, skip visited.
JudgeOutput: Structured output returned by the judge LLM.
Neighborhood: Perturbation strategy around the current baseline.
ParameterRange: A continuous or discrete range for a single tunable parameter.
Random: Uniform random sampling within parameter bounds.
SearchSpace: The set of parameter ranges that define the experiment search space.
Variation

VariationGenerator: A strategy for generating parameter variations one at a time.