Crate border_core
source ·Expand description
Core components for reinforcement learning.
Observation and action
Obs
and Act
traits are abstractions of observation and action in environments.
These traits can handle two or more samples for implementing vectorized environments.
Environment
Env
trait is an abstraction of environments. It has four associated types:
Config
, Obs
, Act
and Info
. Obs
and Act
are concrete types of
observation and action of the environment.
These must implement Obs
and Act
traits, respectively.
The environment that implements Env
generates Step<E: Env>
object
at every environment interaction step with Env::step()
method.
Info
stores some information at every step of interactions of an agent and
the environment. It could be empty (zero-sized struct). Config
represents
configurations of the environment and is used to build.
Policy
Policy<E: Env>
represents a policy, from which actions are sampled for
environment E
. Policy::sample()
takes E::Obs
and emits E::Act
.
It could be probabilistic or deterministic.
Agent
In this crate, Agent<E: Env, R: ReplayBufferBase>
is defined as trainable
Policy<E: Env>
. It is in either training or evaluation mode. In training mode,
the agent’s policy might be probabilistic for exploration, while in evaluation mode,
the policy might be deterministic.
Agent::opt()
method does a single optimization step. The definition of an
optimization step depends on each agent. It might be multiple stochastic gradient
steps in an optimization step. Samples for training are taken from
R: ReplayBufferBase
.
This trait also has methods for saving/loading the trained policy in the given directory.
Replay buffer
ReplayBufferBase
trait is an abstraction of replay buffers. For handling samples,
there are two associated types: PushedItem
and Batch
. PushedItem
is a type
representing samples pushed to the buffer. These samples might be generated from
Step<E: Env>
. StepProcessorBase<E: Env>
trait provides the interface
for converting Step<E: Env>
into PushedItem
.
Batch
is a type of samples taken from the buffer for training Agent
s.
The user implements Agent::opt()
method such that it handles Batch
objects
for doing an optimization step.
A reference implementation
SimpleReplayBuffer<O, A>
implementats ReplayBufferBase
.
This type has two parameters O
and A
, which are representation of
observation and action in the replay buffer. O
and A
must implement
SubBatch
, which has the functionality of storing samples, like Vec<T>
,
for observation and action. The associated types PushedItem
and Batch
are the same type, StdBatch
, representing sets of (o_t, r_t, a_t, o_t+1)
.
SimpleStepProcessor<E, O, A>
might be used with SimpleReplayBuffer<O, A>
.
It converts E::Obs
and E::Act
into SubBatch
s of respective types
and generates StdBatch
. The conversion process relies on trait bounds,
O: From<E::Obs>
and A: From<E::Act>
.
Trainer
Trainer
manages training loop and related objects. The Trainer
object is
built with configurations of Env
, ReplayBufferBase
, StepProcessorBase
and some training parameters. Then, Trainer::train
method starts training loop with
given Agent
and Recorder
.
Modules
- Errors in the library.
- Types for recording various values obtained during training and evaluation.
- A generic implementation of replay buffer.
- Utilities for interaction of agents and environments.
Structs
- A default
Evaluator
. - Represents an action, observation and reward tuple
(a_t, o_t+1, r_t)
with some additional information. - Manages training loop and related objects.
- Configuration of
Trainer
.
Traits
- A set of actions of the environment.
- Represents a trainable policy on an environment.
- Represents an environment, typically an MDP.
- Evaluate
Policy
. - Interface of buffers of experiences from environments.
- Additional information to
Obs
andAct
. - A set of observations of an environment.
- A policy on an environment.
- Interface of replay buffers.
- A batch of transitions for training agents.
- Process
Step
and output an itemSelf::Output
.