rl_traits/lib.rs
1//! Core traits for reinforcement learning environments, policies, and agents.
2//!
3//! `rl-traits` defines the shared vocabulary used across the ecosystem:
4//!
5//! - [`ember-rl`]: algorithm implementations using Burn for neural networks
6//! - [`bevy-gym`]: Bevy ECS plugin for visualising and parallelising environments
7//!
8//! # Design goals
9//!
10//! - **Type-safe by default**: observation and action spaces are associated types,
11//! not runtime objects. The compiler catches mismatches.
12//!
13//! - **Correct `Terminated` vs `Truncated` distinction**: algorithms that bootstrap
14//! value estimates (PPO, DQN, SAC) need this distinction. It is baked into
15//! [`EpisodeStatus`] from day one.
16//!
17//! - **Rendering-free**: this crate has no concept of visualisation. That belongs
18//! in `bevy-gym`.
19//!
20//! - **Bevy-compatible**: `Send + Sync + 'static` bounds on associated types mean
21//! any [`Environment`] implementation can be a Bevy `Component`, enabling
22//! free ECS parallelisation via `Query::par_iter_mut()`.
23//!
24//! - **Minimal dependencies**: only `rand` for RNG abstractions.
25//!
26//! # Quick start
27//!
28//! ```rust
29//! use rl_traits::{Environment, StepResult, EpisodeStatus};
30//! use rand::Rng;
31//!
32//! struct MyEnv;
33//!
34//! impl Environment for MyEnv {
35//! type Observation = f32;
36//! type Action = usize;
37//! type Info = ();
38//!
39//! fn step(&mut self, action: usize) -> StepResult<f32, ()> {
40//! StepResult::new(0.0, 1.0, EpisodeStatus::Continuing, ())
41//! }
42//!
43//! fn reset(&mut self, _seed: Option<u64>) -> (f32, ()) {
44//! (0.0, ())
45//! }
46//!
47//! fn sample_action(&self, rng: &mut impl Rng) -> usize {
48//! rng.gen_range(0..4)
49//! }
50//! }
51//! ```
52
53pub mod agent;
54pub mod buffer;
55pub mod environment;
56pub mod episode;
57pub mod experience;
58pub mod multi_agent;
59pub mod policy;
60pub mod wrappers;
61
62pub use agent::Agent;
63pub use buffer::ReplayBuffer;
64pub use environment::Environment;
65pub use episode::{EpisodeStatus, StepResult};
66pub use experience::Experience;
67pub use multi_agent::{AecEnvironment, ParallelEnvironment};
68pub use policy::{Policy, StochasticPolicy};
69pub use wrappers::{TimeLimit, Wrapper};