Crate border_core

Expand description

Core components for reinforcement learning.

Observation and action

Obs and Act traits are abstractions of observation and action in environments. These traits can handle two or more samples for implementing vectorized environments.

Env trait is an abstraction of environments. It has four associated types: Config, Obs, Act and Info. Obs and Act are concrete types of observation and action of the environment. These must implement Obs and Act traits, respectively. The environment that implements Env generates Step<E: Env> object at every environment interaction step with Env::step() method.

Info stores some information at every step of interactions of an agent and the environment. It could be empty (zero-sized struct). Config represents configurations of the environment and is used to build.

Policy

Policy<E: Env> represents a policy, from which actions are sampled for environment E. Policy::sample() takes E::Obs and emits E::Act. It could be probabilistic or deterministic.

Agent

In this crate, Agent<E: Env, R: ReplayBufferBase> is defined as trainable Policy<E: Env>. It is in either training or evaluation mode. In training mode, the agent’s policy might be probabilistic for exploration, while in evaluation mode, the policy might be deterministic.

Agent::opt() method does a single optimization step. The definition of an optimization step depends on each agent. It might be multiple stochastic gradient steps in an optimization step. Samples for training are taken from R: ReplayBufferBase.

This trait also has methods for saving/loading the trained policy in the given directory.

Replay buffer

ReplayBufferBase trait is an abstraction of replay buffers. For handling samples, there are two associated types: PushedItem and Batch. PushedItem is a type representing samples pushed to the buffer. These samples might be generated from Step<E: Env>. StepProcessorBase<E: Env> trait provides the interface for converting Step<E: Env> into PushedItem.

Batch is a type of samples taken from the buffer for training Agents. The user implements Agent::opt() method such that it handles Batch objects for doing an optimization step.

A reference implementation

SimpleReplayBuffer<O, A> implementats ReplayBufferBase. This type has two parameters O and A, which are representation of observation and action in the replay buffer. O and A must implement SubBatch, which has the functionality of storing samples, like Vec<T>, for observation and action. The associated types PushedItem and Batch are the same type, StdBatch, representing sets of (o_t, r_t, a_t, o_t+1).

SimpleStepProcessor<E, O, A> might be used with SimpleReplayBuffer<O, A>. It converts E::Obs and E::Act into SubBatchs of respective types and generates StdBatch. The conversion process relies on trait bounds, O: From<E::Obs> and A: From<E::Act>.

Trainer

Trainer manages training loop and related objects. The Trainer object is built with configurations of Env, ReplayBufferBase, StepProcessorBase and some training parameters. Then, Trainer::train method starts training loop with given Agent and Recorder.

Modules

error
Errors in the library.
record
Types for recording various values obtained during training and evaluation.
replay_buffer
A generic implementation of replay buffer.
util
Utilities for interaction of agents and environments.

Structs

DefaultEvaluator
A default Evaluator.
Step
Represents an action, observation and reward tuple (a_t, o_t+1, r_t) with some additional information.
SyncSampler
Gets an Agent interacts with an Env and takes samples.
Trainer
Manages training loop and related objects.
TrainerConfig
Configuration of Trainer.

Traits

Act
A set of actions of the environment.
Agent
Represents a trainable policy on an environment.
Env
Represents an environment, typically an MDP.
Evaluator
Evaluate Policy.
ExperienceBufferBase
Interface of buffers of experiences from environments.
Info
Additional information to Obs and Act.
Obs
A set of observations of an environment.
Policy
A policy on an environment.
ReplayBufferBase
Interface of replay buffers.
StdBatchBase
A batch of transitions for training agents.
StepProcessorBase
Process Step and output an item Self::Output.