rlevo
Survival of the fittest, implemented in Rust.
Gradient descent is powerful, but it is a local optimizer. If an agent finds a mediocre solution that is "good enough," it often gets trapped in a local optimum — a mathematical rut that no amount of hyperparameter tuning can escape.
rlevo takes a different path. Built on Burn, this library implements Deep Reinforcement Learning with Evolutionary Optimization: a population-based approach that uses crossover, mutation, and natural selection to optimize neural networks across complex, non-convex search spaces.
Why Evolutionary Optimization with Deep Reinforcement Learning?
| Feature | Standard RL (Gradient-Based) | Evolutionary RL (ERL) |
|---|---|---|
| Optimization | Gradient descent | Black-box / genetic operators |
| Agent focus | Individual policy refinement | Population-wide evolution |
| Learning signal | Step-level rewards (TD-learning) | Episodic fitness (total reward) |
| Search space | Susceptible to local optima | Robust to noise & non-convexity |
| Scaling | Complex distributed synchronization | Embarrassingly parallel |
| Sample efficiency | High | Low (offset by parallelism) |
Because evaluating individuals is independent, ERL maps naturally onto Rust's fearless concurrency and Burn's backend-agnostic tensor operations — turning the sample-efficiency trade-off into a raw-throughput advantage.
Why rlevo?
Most ERL implementations are Python research prototypes built around flat vector observations and fixed-dimension action spaces. rlevo is designed differently from the ground up:
Const-generic dimensional safety. State<D>, Observation<D>, and Action<AD> carry their dimensionality as const generic parameters. Dimension mismatches are compile-time errors, not runtime panics — a guarantee no existing Rust RL crate provides.
Unified evolutionary and gradient-based RL. Evolutionary and gradient-based agents share the same core trait abstractions, so they run against identical environments and compose naturally in a single training loop.
Backend-agnostic tensors via Burn. Neural network weights, population tensors, and replay buffers are all Burn tensors. Hardware backends (CPU, WGPU, CUDA) swap without touching algorithm code.
What's Included
Environments
Classic Control
CartPole— balance a pole on a moving cartMountainCar/MountainCarContinuous— escape a valley with sparse rewardsPendulum— swing-up and stabilizationAcrobot— underactuated double pendulumTenArmedBandit— multi-armed bandit testbed
Box2D Physics
BipedalWalker— bipedal locomotion over varied terrainLunarLander/LunarLanderContinuous— fuel-efficient touchdownCarRacing— top-down racing with visual observations
MuJoCo-style Locomotion
InvertedPendulum/InvertedDoublePendulum— balance tasksReacher— goal-reaching with a two-link armSwimmer— fluid locomotion with drag dynamics
Grid Worlds
- Configurable grid environments with optional memory, keyed doors, and partial observability
Deep RL Algorithms
Value-Based
- DQN — Deep Q-Network with experience replay and target network
- C51 — Categorical DQN (distributional RL over 51 atoms)
- QR-DQN — Quantile Regression DQN
Policy Gradient
- PPO — Proximal Policy Optimization with clipped surrogate objective (categorical and Gaussian policies)
- PPG — Phasic Policy Gradient with auxiliary phase and distillation
Actor-Critic (Continuous Control)
- DDPG — Deep Deterministic Policy Gradient with Ornstein-Uhlenbeck exploration
- TD3 — Twin Delayed DDPG with target policy smoothing
- SAC — Soft Actor-Critic with automatic entropy tuning
Evolutionary & Swarm Algorithms
Classical Algorithms
- Genetic Algorithm (GA) with crossover and mutation operators
- Evolution Strategies (ES), Evolutionary Programming (EP)
- Differential Evolution (DE), Cartesian Genetic Programming (CGP)
Swarm Intelligence
- Particle Swarm Optimization (PSO)
- Ant Colony Optimization (ACO)
- Firefly, Cuckoo Search, Bat Algorithm
- Grey Wolf Optimizer (GWO), Artificial Bee Colony (ABC)
- Whale Optimization Algorithm (WOA), Salp Swarm
Hybrid RL + Evolution
Hybrid training strategies that combine gradient-based RL with evolutionary search are in active design. See the roadmap for details.
Quick Start
[]
= "0.1"
use CartPole;
use ;
# Build the workspace
# Run tests
# Generate documentation
Development Status
rlevo is alpha software. The core trait API is largely settled; algorithm implementations and environments are under active development. Breaking changes may occur before 1.0.
| Area | Status |
|---|---|
| Core trait API | Stable |
| Environments (13+) | Active |
| Deep RL algorithms (8) | Active |
| Evolutionary & swarm algorithms | Active |
| Benchmarking harness | Active |
| Hybrid RL + evolution | Early design |
Dependencies
- Burn 0.19 — backend-agnostic tensor operations with
wgpu,ndarray, training loop, and TUI metrics - rand 0.9 — randomness with deterministic seeding via
splitmix64 - serde 1.0 — serialization for checkpoints and configs
- tracing 0.1 — structured logging
- rapier2d / rapier3d — physics simulation with enhanced determinism
- criterion — benchmarking
Contributing
See CONTRIBUTING.md for guidelines, scope, and how to open a PR.
Ethics and Security
rlevo is training infrastructure — the objectives you encode and the policies you deploy carry real consequences. See ETHICS_AND_AI.md for our commitments around reward function transparency, emergent behavior, and responsible distribution of trained policies.
To report a security vulnerability privately, see SECURITY.md.
Development
This crate was developed with the assistance of AI coding tools (Claude by Anthropic).
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.