rlevo-environments
Standard benchmark environments and landscapes for the rlevo workspace.
This crate provides a collection of reinforcement learning environments ranging from classic tabular problems through continuous-control physics simulations. All environments implement the rlevo-core Environment trait — a common reset / step interface that makes them drop-in compatible with every algorithm in rlevo-deeprl.
Environments
Classic Control
Ports of the canonical Gymnasium control tasks implemented in pure Rust.
| Environment | Module | Observation | Action | Notes |
|---|---|---|---|---|
CartPole |
classic::CartPole |
4-D continuous | Discrete(2) | Physics match Gymnasium CartPole-v1 exactly |
Acrobot |
classic::Acrobot |
6-D continuous | Discrete(3) | Sutton two-link dynamics |
MountainCar |
classic::MountainCar |
2-D continuous | Discrete(3) | Sparse reward; needs exploration |
MountainCarContinuous |
classic::MountainCarContinuous |
2-D continuous | Continuous(1) | Dense reward variant |
Pendulum |
classic::Pendulum |
3-D continuous | Continuous(1) | Underactuated swing-up |
TenArmedBandit |
classic::TenArmedBandit |
— | Discrete(10) | Classic 10-armed bandit; Sutton & Barto |
Toy Text
Tabular MDPs for baseline algorithm validation. Every state is fully observable.
| Environment | Module | State space | Action |
|---|---|---|---|
FrozenLake |
toy_text::FrozenLake |
16 or 64 discrete | Discrete(4) |
CliffWalking |
toy_text::CliffWalking |
48 discrete | Discrete(4) |
Taxi |
toy_text::Taxi |
500 discrete | Discrete(6) |
Blackjack |
toy_text::Blackjack |
32×11×2 discrete | Discrete(2) |
Gridworlds
Twelve partially observable grid environments inspired by Farama Minigrid. All share a common egocentric 7×7×3 observation and a 7-action discrete space. Physics are implemented once in grids::core and reused across every variant.
| Environment | Grid size | Key mechanic |
|---|---|---|
EmptyEnv |
6×6 | Reach the goal |
DoorKeyEnv |
8×8 | Pick up key, unlock door, reach goal |
LavaGapEnv |
7×7 | Navigate a gap in a lava wall |
FourRoomsEnv |
19×19 | Long-horizon exploration |
UnlockEnv |
6×6 | Single lock/key |
UnlockPickupEnv |
7×7 | Unlock then pick up object |
MemoryEnv |
5×… | Remember seen target |
MultiRoomEnv |
variable | Chain of rooms |
CrossingEnv |
11×11 | Navigate obstacles |
DistShiftEnv |
9×… | Adaption to distribution shift |
DynamicObstaclesEnv |
6×6 | Moving obstacles |
GoToDoorEnv |
5×5 | Reach specified door |
Box2D-style Physics
Continuous-control environments powered by Rapier2D. Enabled by the box2d feature (default on).
| Environment | Module | Observation | Action |
|---|---|---|---|
BipedalWalker |
box2d::BipedalWalker |
24-D continuous | Continuous(4) |
LunarLander |
box2d::LunarLander |
8-D continuous | Discrete(4) or Continuous(2) |
CarRacing |
box2d::CarRacing |
96×96 pixel | Continuous(3) |
Locomotion
MuJoCo-style locomotion environments in pure Rust via Rapier3D. Enabled by the locomotion feature (default on).
| Environment | Module | Notes |
|---|---|---|
InvertedPendulum |
locomotion::InvertedPendulum |
1-D balance |
InvertedDoublePendulum |
locomotion::InvertedDoublePendulum |
Harder balance; sparser failure |
Reacher |
locomotion::Reacher |
2-DOF arm reaching |
Swimmer |
locomotion::Swimmer |
3-link swimmer |
Games (planned for v0.2)
Chess and ConnectFour are planned for a future release. Stub modules exist
in-source (src/games/chess/ and src/games/connect_four.rs) but do not yet
implement the Environment trait and are hidden from the public API docs.
Optimization Landscapes
Continuous single-objective fitness functions for evaluating evolutionary algorithms.
| Function | Module | Notes |
|---|---|---|
| Sphere | landscapes::sphere |
Convex, unimodal |
| Ackley | landscapes::ackley |
Multimodal; exponential traps |
| Rastrigin | landscapes::rastrigin |
Highly multimodal |
Quick Start
use ;
use ;
use TimeLimit;
let env = with_config;
let mut env = new;
let mut snap = env.reset.expect;
while !snap.is_done
Gridworld environments use the shared GridAction type:
use ;
use ;
use GridAction;
let mut env = with_config;
let mut snap = env.reset.expect;
while !snap.is_done
Cargo Features
| Feature | Default | Description |
|---|---|---|
box2d |
yes | Box2D-style physics environments via rapier2d |
locomotion |
yes | Locomotion environments via rapier3d + nalgebra |
Disable physics environments to shrink compile time:
[]
= { = "…", = false }
Running Examples
# Classic control
# Gridworlds
# Bandits
# Box2D (requires box2d feature, enabled by default)
# Locomotion (requires locomotion feature, enabled by default)
See examples/README.md for patterns, conventions, and how to write new examples.
Design
Configuration Builder
Every environment ships with a XyzConfig struct and a XyzConfigBuilder with a fluent API. Defaults match the reference Gymnasium implementation where one exists.
use ;
let env = with_config;
Reproducibility
reset() re-seeds the environment's internal RNG from config.seed so the same seed always produces the same episode trajectory. Pass different seeds across parallel workers to get independent rollouts.
Wrappers
wrappers::TimeLimit<E> wraps any Environment and injects a truncation signal after a configurable number of steps. The underlying environment's Terminated status is preserved separately from the wrapper's Truncated — algorithms that distinguish the two (PPO, SAC) can act on both signals correctly.
Episode Status
Snapshot::status() returns one of three variants:
Running— episode continuesTerminated— natural episode end (goal reached, fell over, game over)Truncated— wall-clock time limit exceeded (TimeLimitwrapper)
Const Generics
State and observation dimensions are encoded as const generics (State<D>, Observation<D>). Mismatched dimensions produce a compile-time error rather than a runtime panic.
Testing
# Unit tests (all environments)
# Gridworld solvability integration tests
# (scripted optimal policies verify physics correctness)
References
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," arXiv preprint arXiv:1606.01540, Jun. 2016. https://arxiv.org/abs/1606.01540
- M. Chevalier-Boisvert, B. Dai, M. Towers, R. de Lazcano, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, "Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks," arXiv preprint arXiv:2306.13831, Jun. 2023. https://arxiv.org/abs/2306.13831
- A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 5, pp. 834–846, Sep./Oct. 1983. doi: 10.1109/TSMC.1983.6313077.
- D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv preprint arXiv:1712.01815, Dec. 2017. https://arxiv.org/abs/1712.01815
- J. Leike, M. Martic, V. Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg, "AI safety gridworlds," arXiv preprint arXiv:1711.09883, Nov. 2017. https://arxiv.org/abs/1711.09883
- Rapier physics engine — https://rapier.rs
- Burn framework — https://burn.dev
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.