Crate border_core

Expand description

Core components for reinforcement learning.

§Observation and Action

The Obs and Act traits provide abstractions for observations and actions in environments.

§Environment

The Env trait serves as the fundamental abstraction for environments. It defines four associated types: Config, Obs, Act, and Info. The Obs and Act types represent concrete implementations of environment observations and actions, respectively. These types must implement the Obs and Act traits. Environments implementing Env generate Step<E: Env> objects at each interaction step through the Env::step() method. The Info type stores additional information from each agent-environment interaction, which may be empty (implemented as a zero-sized struct). The Config type represents environment configurations and is used during environment construction.

§Policy

The Policy<E: Env> trait represents a decision-making policy. The Policy::sample() method takes an E::Obs and generates an E::Act. Policies can be either probabilistic or deterministic, depending on the implementation.

§Agent

In this crate, an Agent<E: Env, R: ReplayBufferBase> is defined as a trainable Policy<E: Env>. Agents operate in either training or evaluation mode. During training, the agent’s policy may be probabilistic to facilitate exploration, while in evaluation mode, it typically becomes deterministic.

The Agent::opt() method executes a single optimization step. The specific implementation of an optimization step varies between agents and may include multiple stochastic gradient descent steps. Training samples are obtained from the ReplayBufferBase.

This trait also provides methods for saving and loading trained policy parameters to and from a directory.

§Batch

The TransitionBatch trait represents a batch of transitions in the form (o_t, r_t, a_t, o_t+1). This trait is used for training Agents using reinforcement learning algorithms.

§Replay Buffer and Experience Buffer

The ReplayBufferBase trait provides an abstraction for replay buffers. Its associated type ReplayBufferBase::Batch represents samples used for training Agents. Agents must implement the Agent::opt() method, where ReplayBufferBase::Batch must have appropriate type or trait bounds for training the agent.

While ReplayBufferBase focuses on generating training batches, the ExperienceBufferBase trait handles sample storage. The ExperienceBufferBase::push() method stores samples of type ExperienceBufferBase::Item, typically obtained through environment interactions.

§Reference Implementation

SimpleReplayBuffer<O, A> implements both ReplayBufferBase and ExperienceBufferBase. This type takes two parameters, O and A, representing observation and action types in the replay buffer. Both O and A must implement BatchBase, which provides sample storage functionality similar to Vec<T>. The associated types Item and Batch are both GenericTransitionBatch, representing sets of (o_t, r_t, a_t, o_t+1) transitions.

§Step Processor

The StepProcessor trait plays a crucial role in the training pipeline by transforming environment interactions into training samples. It processes Step<E: Env> objects, which contain the current observation, action, reward, and next observation, into a format suitable for the replay buffer.

The SimpleStepProcessor<E, O, A> is a concrete implementation that:

Maintains the previous observation to construct complete transitions
Converts environment-specific observations and actions (E::Obs and E::Act) into batch-compatible types (O and A) using the From trait
Generates GenericTransitionBatch objects containing the complete transition (o_t, a_t, o_t+1, r_t, is_terminated, is_truncated)
Handles episode termination by properly resetting the previous observation

This processor is essential for implementing temporal difference learning algorithms, as it ensures that transitions are properly formatted and stored in the replay buffer for training.

SimpleStepProcessor<E, O, A> can be used with SimpleReplayBuffer<O, A>. It converts E::Obs and E::Act into their respective BatchBase types and generates GenericTransitionBatch. This conversion relies on the trait bounds O: From<E::Obs> and A: From<E::Act>.

§Trainer

The Trainer manages the training loop and related objects. A Trainer instance is configured with training parameters such as the maximum number of optimization steps and the directory for saving agent parameters during training. The Trainer::train method executes online training of an agent in an environment. During the training loop, the agent interacts with the environment to collect samples and perform optimization steps, while simultaneously recording various metrics.

§Evaluator

The Evaluator<E, P> trait is used to evaluate a policy’s (P) performance in an environment (E). An instance of this type is provided to the Trainer for policy evaluation during training. DefaultEvaluator<E, P> serves as the default implementation of Evaluator<E, P>. This evaluator runs the policy in the environment for a specified number of episodes. At the start of each episode, the environment is reset using Env::reset_with_index() to control specific evaluation conditions.

Modules§

dummy: This module is used for tests.
error: Errors in the library.
generic_replay_buffer: Generic implementation of replay buffers for reinforcement learning.
record: Types and traits for recording and managing training metrics.
test: Agent and Env for testing.

Structs§

DefaultEvaluator: A default implementation of the Evaluator trait.
NullReplayBuffer: A dummy replay buffer that does nothing.
Sampler: Manages the sampling of experiences from the environment.
Step: Represents a single step in the environment, containing the action taken, the resulting observation, reward, and episode status.
Trainer: Manages the training loop and coordinates interactions between components.
TrainerConfig: Configuration parameters for the training process.

Traits§

Act: A trait representing actions that can be taken in an environment.
Agent: A trainable policy that can learn from environment interactions.
Configurable: A trait for objects that can be configured and built from configuration files.
Env: Environment interface for reinforcement learning.
Evaluator: Interface for evaluating reinforcement learning agents.
ExperienceBufferBase: Interface for buffers that store experiences from environments.
Info: Additional information that can be associated with environment steps.
Obs: A trait representing observations from an environment.
Policy: A policy that maps observations to actions in a reinforcement learning environment.
ReplayBufferBase: Interface for replay buffers that generate batches for training.
StepProcessor: Processes environment steps and produces items for a replay buffer.
TransitionBatch: A batch of transitions used for training reinforcement learning agents.