gepa
A production-grade Rust implementation of GEPA (Genetic-Pareto Prompt Optimization), the algorithm described in "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" (ICLR 2026 Oral). GEPA evolves prompt candidates through LLM-guided reflective mutation and Pareto-front selection, consistently outperforming GRPO with far fewer evaluations.
Key results
- +6% average improvement over GRPO across standard benchmarks.
- Up to 35x fewer rollouts than conventional RL-based prompt tuning.
- Multi-objective Pareto tracking preserves solution diversity while focusing budget on hard examples.
Quick start
Add the dependency:
[]
= "0.1"
= { = "1", = ["full"] }
= "0.1"
Implement GEPAAdapter for your task, then call optimize:
use Arc;
use async_trait;
use ;
use VecLoader;
// --- Data types -----------------------------------------------------------
// --- Adapter --------------------------------------------------------------
;
// --- Main -----------------------------------------------------------------
async
async
A complete, runnable example is provided in examples/quickstart.rs.
Features
- Per-instance Pareto frontier — tracks which candidate performs best on each individual validation example, mirroring Algorithm 2 from the paper.
- Reflective mutation — uses an LLM to analyse failure cases and propose improved instruction text (Appendix C prompt template included).
- System-aware merge — periodically merges complementary Pareto candidates by combining high-performing per-instance components (Algorithm 4).
- Provider-agnostic — any server that speaks the OpenAI
/v1/chat/completionsprotocol is supported out of the box viareqwest. - Pluggable strategies — swap candidate selectors (
Pareto,CurrentBest,EpsilonGreedy), component selectors (RoundRobin,All), and frontier types (Instance,Objective,Hybrid,Cartesian) without touching the engine. - Observable — structured
tracinglogs and a typed callback system (GEPACallback) for custom metrics, checkpointing, and early-stopping hooks. - Serialisable results —
GEPAResultround-trips through JSON; schema versioning guards against incompatible future formats. - No unsafe code —
unsafe_code = "forbid"is enforced at the crate level.
Architecture
gepa
├── api.rs optimize() entry point, OptimizeConfig, LMConfig
├── lm.rs LanguageModel trait, OpenAICompatibleLM
├── error.rs GEPAError, Result
├── core/
│ ├── adapter.rs GEPAAdapter trait, Candidate, EvaluationBatch
│ ├── engine.rs GEPAEngine — the main optimisation loop
│ ├── result.rs GEPAResult — immutable run snapshot
│ ├── state.rs GEPAState, FrontierType, Pareto bookkeeping
│ ├── data_loader.rs DataLoader, VecLoader
│ └── callbacks.rs GEPACallback, event structs
├── proposer/
│ ├── reflective_mutation.rs LLM-guided mutation (Algorithm 3)
│ └── merge.rs system-aware merge (Algorithm 4)
├── strategies/
│ ├── candidate_selector.rs Pareto / CurrentBest / EpsilonGreedy
│ ├── component_selector.rs RoundRobin / All
│ ├── batch_sampler.rs EpochShuffledSampler
│ └── eval_policy.rs FullEvalPolicy
└── utils/
├── stop_condition.rs MaxMetricCallsStopper, TimeoutStopper, …
└── pareto.rs Pareto utilities
The central abstraction is GEPAAdapter:
Your code ──[GEPAAdapter]── GEPAEngine ──[LanguageModel]── LLM API
The engine is responsible for all Pareto bookkeeping, candidate selection, budget tracking, and mutation orchestration. Your adapter handles only two things: evaluating a batch of examples and, optionally, building a structured reflective dataset for the mutation LM.
Configuration
OptimizeConfig::new accepts five required arguments and exposes every other
option as a public field with sensible defaults.
Required
| Field | Type | Description |
|---|---|---|
seed_candidate |
Candidate |
Starting component text by component name |
trainset |
Arc<dyn DataLoader<Id, Item>> |
Training split |
valset |
Arc<dyn DataLoader<Id, Item>> |
Validation split (Pareto tracking) |
adapter |
Arc<dyn GEPAAdapter<Item, T, RO>> |
Your evaluation logic |
lm_config |
LMConfig |
Reflection LM settings |
Stop condition (StopConditionConfig)
| Field | Default | Description |
|---|---|---|
max_metric_calls |
Some(500) |
Budget in per-example metric evaluations; cached examples do not consume it |
max_iterations |
None |
Hard iteration cap |
timeout |
None |
Wall-clock limit (std::time::Duration) |
All active conditions are combined with OR — the first to fire stops the run.
Strategy knobs
| Field | Default | Description |
|---|---|---|
candidate_selector |
CandidateSelectorKind::Pareto |
How to pick a base candidate |
component_selector |
ComponentSelectorKind::RoundRobin |
Which prompt components to mutate |
minibatch_size |
3 |
Training examples per iteration |
frontier_type |
FrontierType::Instance |
Pareto tracking strategy |
use_merge |
false |
Enable system-aware merge |
max_merge_invocations |
5 |
Merge budget across the run |
component_metadata |
{} |
Optional text/code/config metadata for component-aware reflection prompts |
LM settings (LMConfig)
| Field | Default | Description |
|---|---|---|
model |
"gpt-4o-mini" |
Model identifier |
api_key |
"" |
Bearer token ("" for local / unauthenticated servers) |
base_url |
"https://api.openai.com" |
API base URL (no trailing slash) |
temperature |
Some(1.0) |
Sampling temperature |
max_tokens |
Some(4096) |
Max tokens for reflection outputs |
max_retries |
3 |
HTTP retries with exponential back-off |
Supported LLM providers
OpenAICompatibleLM calls the standard /v1/chat/completions endpoint.
Point base_url at any compatible server:
| Provider | base_url |
|---|---|
| OpenAI | https://api.openai.com |
| Anthropic (OpenAI shim) | https://api.anthropic.com |
| Ollama | http://localhost:11434 |
| LMStudio | http://localhost:1234 |
| vLLM | http://localhost:8000 |
| Any OpenAI-compatible | your endpoint |
Pass api_key: "" for unauthenticated local servers.
Examples
# Quickstart — sentiment classification with a mock scorer
# Custom adapter — multi-component prompt with merge enabled
# Live API (requires OPENAI_API_KEY)
OPENAI_API_KEY=sk-...
Testing
The optional hermetic e2e test exercises the public optimize() API, the
OpenAI-compatible HTTP LM path, mutation acceptance, callbacks, cache-backed
state, and run-directory persistence:
References
- Paper: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (ICLR 2026 Oral)
- Python reference implementation: gepa-py
- API documentation: docs.rs/gepa
- Original GEPA repo: https://github.com/gepa-ai/gepa
License
MIT — see LICENSE.
Citation