ember-rl
Algorithm implementations for the Rust RL ecosystem, powered by Burn.
ember-rl provides ready-to-use RL algorithms that work with any environment
implementing rl-traits. It handles the
neural networks, replay buffers, and training loops — you bring the environment.
Ecosystem
| Crate | Role |
|---|---|
rl-traits |
Shared traits and types |
| ember-rl | Algorithm implementations (DQN, PPO, SAC) using Burn (this crate) |
bevy-gym |
Bevy ECS plugin for parallelised environment simulation |
Algorithms
| Algorithm | Status |
|---|---|
| DQN | Stable |
| PPO | Planned |
| SAC | Planned |
Usage
Add to Cargo.toml:
[]
= "0.3"
= { = "0.20.1", = ["ndarray", "autodiff"] }
Training with DqnTrainer
The simplest entry point — create an agent, wrap it in a trainer, iterate:
use ;
use ;
type B = ;
let config = default;
let agent = new;
// Attach a named run for automatic checkpointing and JSONL logging
let run = create?;
run.write_config?;
let mut trainer = new
.with_run
.with_checkpoint_freq
.with_keep_checkpoints;
// Iterator-style — full control over the loop
for step in trainer.steps.take
// Eval at end — saves best.mpk automatically
let report = trainer.eval;
report.print;
TrainingSession — loop-agnostic coordinator
TrainingSession is the composable core behind DqnTrainer. Use it directly
when your training loop is owned externally — for example, in a Bevy ECS system:
use ;
use ActMode;
// Any LearningAgent implementation works here
let session = new
.with_run
.with_checkpoint_freq
.with_keep_checkpoints;
// Each environment step:
let action = session.act;
session.observe; // auto-checkpoints at milestones
// Each episode end:
session.on_episode;
// → logs to JSONL, merges agent + env extras, saves best checkpoint if improved
if session.is_done
Evaluation
// Eval at the end of training — returns an EvalReport
let report = trainer.eval;
report.print;
// Or load a saved checkpoint for inference (no autodiff overhead)
use NdArray;
use DqnPolicy;
let policy = new
.load?;
let action = policy.act;
Convert a trained agent to an inference policy
// into_policy() strips training state and downcasts to a plain Backend
let policy = trainer.into_agent.into_policy;
Resuming training
let run = resume?; // picks latest timestamp
println!;
Custom replay buffers
// Swap in any ReplayBuffer implementation (e.g. PER)
let agent = new_with_buffer;
Training run directory layout
TrainingRun manages a versioned on-disk structure:
runs/<name>/<version>/<YYYYMMDD_HHMMSS>/
metadata.json ← name, version, step counts, timestamps
config.json ← serialized hyperparams, encoder, action mapper
checkpoints/
step_<N>.mpk ← periodic checkpoints (pruned to keep_last n)
latest.mpk ← most recent checkpoint
best.mpk ← best eval-reward checkpoint
train_episodes.jsonl ← one EpisodeRecord per line (reward, length, extras)
eval_episodes.jsonl ← eval episodes tagged with total_steps_at_eval
Stats
The stats module provides composable, algorithm-independent statistics tracking.
Both algorithms and environments can register the stats they want to collect:
use ;
// Default tracker: episode_reward (mean) and episode_length (mean)
let mut tracker = new
.with
.with
.with_custom;
tracker.update;
let summary = tracker.summary; // HashMap<String, f64>
Available aggregators: Mean, Max, Min, Last, RollingMean, Std.
Per-episode dynamics (e.g. training loss) are captured by the agent via its own
internal aggregators and exposed through LearningAgent::episode_extras().
These are merged with environment extras (Environment::episode_extras() from
rl-traits) into each EpisodeRecord automatically by TrainingSession.
Implementing ObservationEncoder
ember-rl bridges the generic rl-traits world to Burn tensors through two
traits you implement for your observation and action types:
use ;
;
;
Built-in VecEncoder and UsizeActionMapper cover the common Vec<f32> /
usize case without any boilerplate.
Reference environments
Enable with --features envs:
= { = "0.3", = ["envs"] }
| Environment | Description |
|---|---|
CartPole-v1 |
Classic balance task matching the Gymnasium spec |
Running the CartPole example
# Train (saves checkpoints to runs/cartpole/v1/<timestamp>/)
cargo run --example cartpole --features envs --release
# Eval from the latest saved run
cargo run --example cartpole --features envs --release -- --eval runs/cartpole/v1
DQN notes
- Two separate RNGs. The agent uses independent RNGs for ε-greedy exploration and replay buffer sampling. Sharing a single RNG causes subtle learning instability.
DqnConfig::default()uses conservative general-purpose hyperparameters. Domain-specific examples override them explicitly.- Epsilon decay is linear from
epsilon_starttoepsilon_endoverepsilon_decay_steps, then flat. - Target network is a hard-copy of the online network, updated every
target_update_freqsteps. - Checkpoints use Burn's
CompactRecorder(MessagePack format,.mpk). Only network weights are saved — sufficient for inference. Resume training by callingagent.load(path)followed byagent.set_total_steps(n).
Development
This crate was developed with the assistance of AI coding tools (Claude by Anthropic).
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.