rlevo

Survival of the fittest, implemented in Rust.

Gradient descent is powerful, but it is a local optimizer. If an agent finds a mediocre solution that is "good enough," it often gets trapped in a local optimum — a mathematical rut that no amount of hyperparameter tuning can escape.

rlevo takes a different path. Built on Burn, this library implements Deep Reinforcement Learning with Evolutionary Optimization: a population-based approach that uses crossover, mutation, and natural selection to optimize neural networks across complex, non-convex search spaces.

Why Evolutionary Optimization with Deep Reinforcement Learning?

Feature	Standard RL (Gradient-Based)	Evolutionary RL (ERL)
Optimization	Gradient descent	Black-box / genetic operators
Agent focus	Individual policy refinement	Population-wide evolution
Learning signal	Step-level rewards (TD-learning)	Episodic fitness (total reward)
Search space	Susceptible to local optima	Robust to noise & non-convexity
Scaling	Complex distributed synchronization	Embarrassingly parallel
Sample efficiency	High	Low (offset by parallelism)

Because evaluating individuals is independent, ERL maps naturally onto Rust's fearless concurrency and Burn's backend-agnostic tensor operations — turning the sample-efficiency trade-off into a raw-throughput advantage.

Why `rlevo`?

Most ERL implementations are Python research prototypes built around flat vector observations and fixed-dimension action spaces. rlevo is designed differently from the ground up:

Const-generic dimensional safety. State<D>, Observation<D>, and Action<AD> carry their dimensionality as const generic parameters. Dimension mismatches are compile-time errors, not runtime panics — a guarantee no existing Rust RL crate provides.

Unified evolutionary and gradient-based RL. Evolutionary and gradient-based agents share the same core trait abstractions, so they run against identical environments and compose naturally in a single training loop.

Backend-agnostic tensors via Burn. Neural network weights, population tensors, and replay buffers are all Burn tensors. Hardware backends (CPU, WGPU, CUDA) swap without touching algorithm code.

What's Included

Environments

Classic Control

CartPole — balance a pole on a moving cart
MountainCar / MountainCarContinuous — escape a valley with sparse rewards
Pendulum — swing-up and stabilization
Acrobot — underactuated double pendulum
TenArmedBandit — multi-armed bandit testbed

Box2D Physics

BipedalWalker — bipedal locomotion over varied terrain
LunarLander / LunarLanderContinuous — fuel-efficient touchdown
CarRacing — top-down racing with visual observations

MuJoCo-style Locomotion

InvertedPendulum / InvertedDoublePendulum — balance tasks
Reacher — goal-reaching with a two-link arm
Swimmer — fluid locomotion with drag dynamics

Grid Worlds

Configurable grid environments with optional memory, keyed doors, and partial observability

Deep RL Algorithms

Value-Based

DQN — Deep Q-Network with experience replay and target network
C51 — Categorical DQN (distributional RL over 51 atoms)
QR-DQN — Quantile Regression DQN

Policy Gradient

PPO — Proximal Policy Optimization with clipped surrogate objective (categorical and Gaussian policies)
PPG — Phasic Policy Gradient with auxiliary phase and distillation

Actor-Critic (Continuous Control)

DDPG — Deep Deterministic Policy Gradient with Ornstein-Uhlenbeck exploration
TD3 — Twin Delayed DDPG with target policy smoothing
SAC — Soft Actor-Critic with automatic entropy tuning

Evolutionary & Swarm Algorithms

Classical Algorithms

Genetic Algorithm (GA) with crossover and mutation operators
Evolution Strategies (ES), Evolutionary Programming (EP)
Differential Evolution (DE), Cartesian Genetic Programming (CGP)

Swarm Intelligence

Particle Swarm Optimization (PSO)
Ant Colony Optimization (ACO)
Firefly, Cuckoo Search, Bat Algorithm
Grey Wolf Optimizer (GWO), Artificial Bee Colony (ABC)
Whale Optimization Algorithm (WOA), Salp Swarm

Hybrid RL + Evolution

Hybrid training strategies that combine gradient-based RL with evolutionary search are in active design. See the roadmap for details.

Quick Start

[dependencies]
rlevo = "0.1"

use rlevo::envs::classic::CartPole;
use rlevo::core::{Environment, EpisodeStatus};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut env = CartPole::new(false);
    let snapshot = env.reset()?;
    println!("Initial observation: {:?}", snapshot.observation());

    loop {
        // Replace with your policy — here we pick action 0 unconditionally
        let action = env.sample_action();
        let snapshot = env.step(action)?;

        if matches!(snapshot.status(), EpisodeStatus::Terminated | EpisodeStatus::Truncated) {
            break;
        }
    }
    Ok(())
}

# Build the workspace
cargo build

# Run tests
cargo test

# Generate documentation
cargo doc --workspace --no-deps --open

Development Status

rlevo is alpha software. The core trait API is largely settled; algorithm implementations and environments are under active development. Breaking changes may occur before 1.0.

Area	Status
Core trait API	Stable
Environments (13+)	Active
Deep RL algorithms (8)	Active
Evolutionary & swarm algorithms	Active
Benchmarking harness	Active
Hybrid RL + evolution	Early design

Dependencies

Burn 0.19 — backend-agnostic tensor operations with wgpu, ndarray, training loop, and TUI metrics
rand 0.9 — randomness with deterministic seeding via splitmix64
serde 1.0 — serialization for checkpoints and configs
tracing 0.1 — structured logging
rapier2d / rapier3d — physics simulation with enhanced determinism
criterion — benchmarking

Contributing

See CONTRIBUTING.md for guidelines, scope, and how to open a PR.

Ethics and Security

rlevo is training infrastructure — the objectives you encode and the policies you deploy carry real consequences. See ETHICS_AND_AI.md for our commitments around reward function transparency, emergent behavior, and responsible distribution of trained policies.

To report a security vulnerability privately, see SECURITY.md.

Development

This crate was developed with the assistance of AI coding tools (Claude by Anthropic).

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.

rlevo 0.1.0