Crate border_core

Expand description

Core components for reinforcement learning.

§Observation and action

Obs and Act traits are abstractions of observation and action in environments. These traits can handle two or more samples for implementing vectorized environments, although there is currently no implementation of vectorized environment.

§Environment

Env trait is an abstraction of environments. It has four associated types: Config, Obs, Act and Info. Obs and Act are concrete types of observation and action of the environment. These types must implement Obs and Act traits, respectively. The environment that implements Env generates Step<E: Env> object at every environment interaction step with Env::step() method. Info stores some information at every step of interactions of an agent and the environment. It could be empty (zero-sized struct). Config represents configurations of the environment and is used to build.

§Policy

Policy<E: Env> represents a policy. Policy::sample() takes E::Obs and generates E::Act. It could be probabilistic or deterministic.

§Agent

In this crate, Agent<E: Env, R: ReplayBufferBase> is defined as trainable Policy<E: Env>. It is in either training or evaluation mode. In training mode, the agent’s policy might be probabilistic for exploration, while in evaluation mode, the policy might be deterministic.

The Agent::opt() method performs a single optimization step. The definition of an optimization step varies for each agent. It might be multiple stochastic gradient steps in an optimization step. Samples for training are taken from R: ReplayBufferBase.

This trait also has methods for saving/loading parameters of the trained policy in a directory.

§Batch

TransitionBatch is a trait of a batch of transitions (o_t, r_t, a_t, o_t+1). This trait is used to train Agents using an RL algorithm.

§Replay buffer and experience buffer

ReplayBufferBase trait is an abstraction of replay buffers. One of the associated type ReplayBufferBase::Batch represents samples taken from the buffer for training Agents. Agents must implements Agent::opt() method, where ReplayBufferBase::Batch has an appropriate type or trait bound(s) to train the agent.

As explained above, ReplayBufferBase trait has an ability to generates batches of samples with which agents are trained. On the other hand, ExperienceBufferBase trait has an ability to store samples. ExperienceBufferBase::push() is used to push samples of type ExperienceBufferBase::Item, which might be obtained via interaction steps with an environment.

§A reference implementation

SimpleReplayBuffer<O, A> implementats both ReplayBufferBase and ExperienceBufferBase. This type has two parameters O and A, which are representation of observation and action in the replay buffer. O and A must implement BatchBase, which has the functionality of storing samples, like Vec<T>, for observation and action. The associated types Item and Batch are the same type, GenericTransitionBatch, representing sets of (o_t, r_t, a_t, o_t+1).

SimpleStepProcessor<E, O, A> might be used with SimpleReplayBuffer<O, A>. It converts E::Obs and E::Act into BatchBases of respective types and generates GenericTransitionBatch. The conversion process relies on trait bounds, O: From<E::Obs> and A: From<E::Act>.

§Trainer

Trainer manages training loop and related objects. The Trainer object is built with configurations of training parameters such as the maximum number of optimization steps, model directory to save parameters of the agent during training, etc. Trainer::train method executes online training of an agent on an environment. In the training loop of this method, the agent interacts with the environment to take samples and perform optimization steps. Some metrices are recorded at the same time.

§Evaluator

Evaluator<E, P> is used to evaluate the policy’s (P) performance in the environment (E). The object of this type is given to the Trainer object to evaluate the policy during training. DefaultEvaluator<E, P> is a default implementation of Evaluator<E, P>. This evaluator runs the policy in the environment for a certain number of episodes. At the start of each episode, the environment is reset using Env::reset_with_index() to control specific conditions for evaluation.

Modules§

error: Errors in the library.
generic_replay_buffer: A generic implementation of replay buffer.
record: Types for recording various values obtained during training and evaluation.
test: Agent and Env for testing.

Structs§

DefaultEvaluator: A default Evaluator.
Sampler: Encapsulates sampling steps. Specifically it does the followint steps:
Step: Represents an action, observation and reward tuple (a_t, o_t+1, r_t) with some additional information.
Trainer: Manages training loop and related objects.
TrainerConfig: Configuration of Trainer.

Traits§

Act: A set of actions of the environment.
Agent: Represents a trainable policy on an environment.
Configurable: A configurable object, having type parameter.
Env: Represents an environment, typically an MDP.
Evaluator: Evaluate Policy.
ExperienceBufferBase: Interface of buffers of experiences from environments.
Info: Additional information to Obs and Act.
Obs: A set of observations of an environment.
Policy: A policy on an environment.
ReplayBufferBase: Interface of replay buffers.
StepProcessor: Process Step and output an item Self::Output.
TransitionBatch: A batch of transitions for training agents.

Crate border_coreCopy item path