ember-rl
Algorithm implementations for the Rust RL ecosystem, powered by Burn.
ember-rl provides ready-to-use RL algorithms that work with any environment
implementing rl-traits. It handles the
neural networks, replay buffers, and training loops — you bring the environment.
Ecosystem
| Crate | Role |
|---|---|
rl-traits |
Shared traits and types |
| ember-rl | Algorithm implementations (DQN, PPO, SAC) using Burn (this crate) |
bevy-gym (planned) |
Bevy ECS plugin for visualising and parallelising environments |
Algorithms
| Algorithm | Status |
|---|---|
| DQN | Stable |
| PPO | Planned |
| SAC | Planned |
Usage
Add to Cargo.toml:
[]
= "*"
= { = "0.20.1", = ["ndarray", "autodiff"] }
DQN on a custom environment
use ;
use ;
type B = ;
let config = default;
let agent = new;
let mut runner = new;
for step in runner.steps.take
The runner exposes training as an infinite iterator. Use .take(n) to cap steps,
or break on a solved condition — no callbacks, no inversion of control.
Implementing ObservationEncoder
ember-rl bridges the generic rl-traits world to Burn tensors through two
traits you implement for your observation and action types:
use ;
// Encode a Vec<f32> observation into a 1-D Burn tensor
;
// Map between usize action indices and your Action type
;
Built-in VecEncoder and UsizeActionMapper cover the common Vec<f32> /
usize case without any boilerplate.
Reference environments
Enable with --features envs:
= { = "*", = ["envs"] }
| Environment | Description |
|---|---|
CartPole-v1 |
Classic balance task matching the Gymnasium spec |
Running the CartPole example
cargo run --example cartpole --features envs --release
Expected output: the agent reaches a reward of 500 (episode solved) within a few hundred episodes.
DQN notes
- Two separate RNGs. The runner drives ε-greedy exploration; the agent drives buffer sampling. These are intentionally decoupled — sharing a single RNG causes subtle learning instability.
DqnConfig::default()uses conservative general-purpose hyperparameters. Domain-specific examples override them explicitly.- Epsilon decay is linear from
epsilon_starttoepsilon_endoverepsilon_decay_steps, then flat. - Target network is a hard-copy of the online network, updated every
target_update_freqsteps.
Development
This crate was developed with the assistance of AI coding tools (Claude by Anthropic).
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.