rlevo-environments 0.1.0

RL benchmark environments and landscapes for rlevo (internal crate — use `rlevo` for the full API)
Documentation

rlevo-environments

Standard benchmark environments and landscapes for the rlevo workspace.

This crate provides a collection of reinforcement learning environments ranging from classic tabular problems through continuous-control physics simulations. All environments implement the rlevo-core Environment trait — a common reset / step interface that makes them drop-in compatible with every algorithm in rlevo-deeprl.


Environments

Classic Control

Ports of the canonical Gymnasium control tasks implemented in pure Rust.

Environment Module Observation Action Notes
CartPole classic::CartPole 4-D continuous Discrete(2) Physics match Gymnasium CartPole-v1 exactly
Acrobot classic::Acrobot 6-D continuous Discrete(3) Sutton two-link dynamics
MountainCar classic::MountainCar 2-D continuous Discrete(3) Sparse reward; needs exploration
MountainCarContinuous classic::MountainCarContinuous 2-D continuous Continuous(1) Dense reward variant
Pendulum classic::Pendulum 3-D continuous Continuous(1) Underactuated swing-up
TenArmedBandit classic::TenArmedBandit Discrete(10) Classic 10-armed bandit; Sutton & Barto

Toy Text

Tabular MDPs for baseline algorithm validation. Every state is fully observable.

Environment Module State space Action
FrozenLake toy_text::FrozenLake 16 or 64 discrete Discrete(4)
CliffWalking toy_text::CliffWalking 48 discrete Discrete(4)
Taxi toy_text::Taxi 500 discrete Discrete(6)
Blackjack toy_text::Blackjack 32×11×2 discrete Discrete(2)

Gridworlds

Twelve partially observable grid environments inspired by Farama Minigrid. All share a common egocentric 7×7×3 observation and a 7-action discrete space. Physics are implemented once in grids::core and reused across every variant.

Environment Grid size Key mechanic
EmptyEnv 6×6 Reach the goal
DoorKeyEnv 8×8 Pick up key, unlock door, reach goal
LavaGapEnv 7×7 Navigate a gap in a lava wall
FourRoomsEnv 19×19 Long-horizon exploration
UnlockEnv 6×6 Single lock/key
UnlockPickupEnv 7×7 Unlock then pick up object
MemoryEnv 5×… Remember seen target
MultiRoomEnv variable Chain of rooms
CrossingEnv 11×11 Navigate obstacles
DistShiftEnv 9×… Adaption to distribution shift
DynamicObstaclesEnv 6×6 Moving obstacles
GoToDoorEnv 5×5 Reach specified door

Box2D-style Physics

Continuous-control environments powered by Rapier2D. Enabled by the box2d feature (default on).

Environment Module Observation Action
BipedalWalker box2d::BipedalWalker 24-D continuous Continuous(4)
LunarLander box2d::LunarLander 8-D continuous Discrete(4) or Continuous(2)
CarRacing box2d::CarRacing 96×96 pixel Continuous(3)

Locomotion

MuJoCo-style locomotion environments in pure Rust via Rapier3D. Enabled by the locomotion feature (default on).

Environment Module Notes
InvertedPendulum locomotion::InvertedPendulum 1-D balance
InvertedDoublePendulum locomotion::InvertedDoublePendulum Harder balance; sparser failure
Reacher locomotion::Reacher 2-DOF arm reaching
Swimmer locomotion::Swimmer 3-link swimmer

Games (planned for v0.2)

Chess and ConnectFour are planned for a future release. Stub modules exist in-source (src/games/chess/ and src/games/connect_four.rs) but do not yet implement the Environment trait and are hidden from the public API docs.


Optimization Landscapes

Continuous single-objective fitness functions for evaluating evolutionary algorithms.

Function Module Notes
Sphere landscapes::sphere Convex, unimodal
Ackley landscapes::ackley Multimodal; exponential traps
Rastrigin landscapes::rastrigin Highly multimodal

Quick Start

use rlevo_core::environment::{Environment, Snapshot};
use rlevo_envs::classic::{CartPole, CartPoleConfig, CartPoleAction};
use rlevo_envs::wrappers::TimeLimit;

let env = CartPole::with_config(CartPoleConfig::default());
let mut env = TimeLimit::new(env, 500);

let mut snap = env.reset().expect("reset");
while !snap.is_done() {
    snap = env.step(CartPoleAction::Right).expect("step");
}

Gridworld environments use the shared GridAction type:

use rlevo_core::environment::{Environment, Snapshot};
use rlevo_envs::grids::{EmptyEnv, EmptyConfig};
use rlevo_envs::grids::core::GridAction;

let mut env = EmptyEnv::with_config(EmptyConfig::default(), false);
let mut snap = env.reset().expect("reset");
while !snap.is_done() {
    snap = env.step(GridAction::Forward).expect("step");
}

Cargo Features

Feature Default Description
box2d yes Box2D-style physics environments via rapier2d
locomotion yes Locomotion environments via rapier3d + nalgebra

Disable physics environments to shrink compile time:

[dependencies]
rlevo-environments = { path = "", default-features = false }

Running Examples

# Classic control
cargo run -p rlevo-environments --example cartpole_random
cargo run -p rlevo-environments --example cartpole_timelimit
cargo run -p rlevo-environments --example pendulum_random
cargo run -p rlevo-environments --example mountain_car_random
cargo run -p rlevo-environments --example mountain_car_continuous_random
cargo run -p rlevo-environments --example acrobot_random

# Gridworlds
cargo run -p rlevo-environments --example grid_empty_random
cargo run -p rlevo-environments --example grid_door_key_scripted
cargo run -p rlevo-environments --example grid_memory_random

# Bandits
cargo run -p rlevo-environments --example ten_armed_bandit_training

# Box2D (requires box2d feature, enabled by default)
cargo run -p rlevo-environments --example bipedal_walker_random
cargo run -p rlevo-environments --example lunar_lander_discrete_random
cargo run -p rlevo-environments --example lunar_lander_continuous_random
cargo run -p rlevo-environments --example car_racing_random

# Locomotion (requires locomotion feature, enabled by default)
cargo run -p rlevo-environments --example reacher_random
cargo run -p rlevo-environments --example inverted_double_pendulum_random
cargo run -p rlevo-environments --example swimmer_random

See examples/README.md for patterns, conventions, and how to write new examples.


Design

Configuration Builder

Every environment ships with a XyzConfig struct and a XyzConfigBuilder with a fluent API. Defaults match the reference Gymnasium implementation where one exists.

use rlevo_environments::classic::{CartPole, CartPoleConfig};

let env = CartPole::with_config(
    CartPoleConfig {
        seed: 42,
        ..CartPoleConfig::default()
    }
);

Reproducibility

reset() re-seeds the environment's internal RNG from config.seed so the same seed always produces the same episode trajectory. Pass different seeds across parallel workers to get independent rollouts.

Wrappers

wrappers::TimeLimit<E> wraps any Environment and injects a truncation signal after a configurable number of steps. The underlying environment's Terminated status is preserved separately from the wrapper's Truncated — algorithms that distinguish the two (PPO, SAC) can act on both signals correctly.

Episode Status

Snapshot::status() returns one of three variants:

  • Running — episode continues
  • Terminated — natural episode end (goal reached, fell over, game over)
  • Truncated — wall-clock time limit exceeded (TimeLimit wrapper)

Const Generics

State and observation dimensions are encoded as const generics (State<D>, Observation<D>). Mismatched dimensions produce a compile-time error rather than a runtime panic.


Testing

# Unit tests (all environments)
cargo test -p rlevo-environments

# Gridworld solvability integration tests
# (scripted optimal policies verify physics correctness)
cargo test -p rlevo-environments --test grids_solvable

References

  • G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," arXiv preprint arXiv:1606.01540, Jun. 2016. https://arxiv.org/abs/1606.01540
  • M. Chevalier-Boisvert, B. Dai, M. Towers, R. de Lazcano, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, "Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks," arXiv preprint arXiv:2306.13831, Jun. 2023. https://arxiv.org/abs/2306.13831
  • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 5, pp. 834–846, Sep./Oct. 1983. doi: 10.1109/TSMC.1983.6313077.
  • D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv preprint arXiv:1712.01815, Dec. 2017. https://arxiv.org/abs/1712.01815
  • J. Leike, M. Martic, V. Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg, "AI safety gridworlds," arXiv preprint arXiv:1711.09883, Nov. 2017. https://arxiv.org/abs/1711.09883
  • Rapier physics engine — https://rapier.rs
  • Burn framework — https://burn.dev

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.