Expand description
Evolution engine: the glue that turns primitives into a working A/B loop.
Called from cmd_record (after each session is inserted) and from
cmd_roll (manual challenger generation).
Functions§
- collect_
scores_ for_ config - Load all signals tied to sessions that ran under
config_id, group by session, and collapse each session’s signals into a single 0..=1 score. - evaluate_
promotion - Evaluate the running experiment (if any) against the promotion threshold.
Returns the experiment + decision, or
Noneif no experiment is running. - generate_
challenger - Convenience wrapper around
generate_challenger_with_pickerthat uses the default LLM-aware picker. Callers that may not have an LLM should callgenerate_challenger_with_pickerwithpicker_for_environment(false). - generate_
challenger_ with_ picker - Generate a challenger from the current champion using one mutator, persist it as an AgentConfig row with role=Challenger, start a new Experiment with traffic_share=1.0 (v0.2.0 deploys the challenger full-time and compares against the historical champion’s session population), and apply the challenger config to disk via the adapter.
- picker_
for_ environment - Build the mutator picker, omitting LLM-dependent mutators if no LLM is reachable. Without this, the default 50%-LLM-rewrite weight means roughly half of all challenger generations would silently fail to mutate anything when the user has no Anthropic key and no local Ollama.
- promote_
challenger - Promote the challenger: mark experiment as Promoted, swap project’s champion pointer, and re-apply the new champion to disk via the adapter.
- resolve_
active_ deployment - Figure out which variant + config_id a new session should be tagged with. If an experiment is running: challenger variant on the challenger config. Otherwise: champion variant on the project’s champion config.
- should_
evolve - Default scheduler: trigger challenger generation when enough sessions have accumulated since the last champion change. Skips if an experiment is already running.