Crate border_core
source ·Expand description
Core components for reinforcement learning.
Observation and action
Obs and Act traits are abstractions of observation and action in environments.
These traits can handle two or more samples for implementing vectorized environments.
Environment
Env trait is an abstraction of environments. It has four associated types:
Config, Obs, Act and Info. Obs and Act are concrete types of
observation and action of the environment.
These must implement Obs and Act traits, respectively.
The environment that implements Env generates Step<E: Env> object
at every environment interaction step with Env::step() method.
Info stores some information at every step of interactions of an agent and
the environment. It could be empty (zero-sized struct). Config represents
configurations of the environment and is used to build.
Policy
Policy<E: Env> represents a policy, from which actions are sampled for
environment E. Policy::sample() takes E::Obs and emits E::Act.
It could be probabilistic or deterministic.
Agent
In this crate, Agent<E: Env, R: ReplayBufferBase> is defined as trainable
Policy<E: Env>. It is in either training or evaluation mode. In training mode,
the agent’s policy might be probabilistic for exploration, while in evaluation mode,
the policy might be deterministic.
Agent::opt() method does a single optimization step. The definition of an
optimization step depends on each agent. It might be multiple stochastic gradient
steps in an optimization step. Samples for training are taken from
R: ReplayBufferBase.
This trait also has methods for saving/loading the trained policy in the given directory.
Replay buffer
ReplayBufferBase trait is an abstraction of replay buffers. For handling samples,
there are two associated types: PushedItem and Batch. PushedItem is a type
representing samples pushed to the buffer. These samples might be generated from
Step<E: Env>. StepProcessorBase<E: Env> trait provides the interface
for converting Step<E: Env> into PushedItem.
Batch is a type of samples taken from the buffer for training Agents.
The user implements Agent::opt() method such that it handles Batch objects
for doing an optimization step.
A reference implementation
SimpleReplayBuffer<O, A> implementats ReplayBufferBase.
This type has two parameters O and A, which are representation of
observation and action in the replay buffer. O and A must implement
SubBatch, which has the functionality of storing samples, like Vec<T>,
for observation and action. The associated types PushedItem and Batch
are the same type, StdBatch, representing sets of (o_t, r_t, a_t, o_t+1).
SimpleStepProcessor<E, O, A> might be used with SimpleReplayBuffer<O, A>.
It converts E::Obs and E::Act into SubBatchs of respective types
and generates StdBatch. The conversion process relies on trait bounds,
O: From<E::Obs> and A: From<E::Act>.
Trainer
Trainer manages training loop and related objects. The Trainer object is
built with configurations of Env, ReplayBufferBase, StepProcessorBase
and some training parameters. Then, Trainer::train method starts training loop with
given Agent and Recorder.
Modules
- Errors in the library.
- Types for recording various values obtained during training and evaluation.
- A generic implementation of replay buffer.
- Utilities for interaction of agents and environments.
Structs
- A default
Evaluator. - Represents an action, observation and reward tuple
(a_t, o_t+1, r_t)with some additional information. - Manages training loop and related objects.
- Configuration of
Trainer.
Traits
- A set of actions of the environment.
- Represents a trainable policy on an environment.
- Represents an environment, typically an MDP.
- Evaluate
Policy. - Interface of buffers of experiences from environments.
- Additional information to
ObsandAct. - A set of observations of an environment.
- A policy on an environment.
- Interface of replay buffers.
- A batch of transitions for training agents.
- Process
Stepand output an itemSelf::Output.