Expand description
Core components for reinforcement learning.
§Observation and action
Obs
and Act
traits are abstractions of observation and action in environments.
These traits can handle two or more samples for implementing vectorized environments,
although there is currently no implementation of vectorized environment.
§Environment
Env
trait is an abstraction of environments. It has four associated types:
Config
, Obs
, Act
and Info
. Obs
and Act
are concrete types of
observation and action of the environment.
These types must implement Obs
and Act
traits, respectively.
The environment that implements Env
generates Step<E: Env>
object
at every environment interaction step with Env::step()
method.
Info
stores some information at every step of interactions of an agent and
the environment. It could be empty (zero-sized struct). Config
represents
configurations of the environment and is used to build.
§Policy
Policy<E: Env>
represents a policy. Policy::sample()
takes E::Obs
and
generates E::Act
. It could be probabilistic or deterministic.
§Agent
In this crate, Agent<E: Env, R: ReplayBufferBase>
is defined as trainable
Policy<E: Env>
. It is in either training or evaluation mode. In training mode,
the agent’s policy might be probabilistic for exploration, while in evaluation mode,
the policy might be deterministic.
The Agent::opt()
method performs a single optimization step. The definition of an
optimization step varies for each agent. It might be multiple stochastic gradient
steps in an optimization step. Samples for training are taken from
R: ReplayBufferBase
.
This trait also has methods for saving/loading parameters of the trained policy in a directory.
§Batch
TransitionBatch
is a trait of a batch of transitions (o_t, r_t, a_t, o_t+1)
.
This trait is used to train Agent
s using an RL algorithm.
§Replay buffer and experience buffer
ReplayBufferBase
trait is an abstraction of replay buffers.
One of the associated type ReplayBufferBase::Batch
represents samples taken from
the buffer for training Agent
s. Agents must implements Agent::opt()
method,
where ReplayBufferBase::Batch
has an appropriate type or trait bound(s) to train
the agent.
As explained above, ReplayBufferBase
trait has an ability to generates batches
of samples with which agents are trained. On the other hand, ExperienceBufferBase
trait has an ability to store samples. ExperienceBufferBase::push()
is used to push
samples of type ExperienceBufferBase::Item
, which might be obtained via interaction
steps with an environment.
§A reference implementation
SimpleReplayBuffer<O, A>
implementats both ReplayBufferBase
and ExperienceBufferBase
.
This type has two parameters O
and A
, which are representation of
observation and action in the replay buffer. O
and A
must implement
BatchBase
, which has the functionality of storing samples, like Vec<T>
,
for observation and action. The associated types Item
and Batch
are the same type, GenericTransitionBatch
, representing sets of (o_t, r_t, a_t, o_t+1)
.
SimpleStepProcessor<E, O, A>
might be used with SimpleReplayBuffer<O, A>
.
It converts E::Obs
and E::Act
into BatchBase
s of respective types
and generates GenericTransitionBatch
. The conversion process relies on trait bounds,
O: From<E::Obs>
and A: From<E::Act>
.
§Trainer
Trainer
manages training loop and related objects. The Trainer
object is
built with configurations of training parameters such as the maximum number of
optimization steps, model directory to save parameters of the agent during training, etc.
Trainer::train
method executes online training of an agent on an environment.
In the training loop of this method, the agent interacts with the environment to
take samples and perform optimization steps. Some metrices are recorded at the same time.
§Evaluator
Evaluator<E, P>
is used to evaluate the policy’s (P
) performance in the environment (E
).
The object of this type is given to the Trainer
object to evaluate the policy during training.
DefaultEvaluator<E, P>
is a default implementation of Evaluator<E, P>
.
This evaluator runs the policy in the environment for a certain number of episodes.
At the start of each episode, the environment is reset using Env::reset_with_index()
to control specific conditions for evaluation.
Modules§
- error
- Errors in the library.
- generic_
replay_ buffer - A generic implementation of replay buffer.
- record
- Types for recording various values obtained during training and evaluation.
- test
- Agent and Env for testing.
Structs§
- Default
Evaluator - A default
Evaluator
. - Sampler
- Encapsulates sampling steps. Specifically it does the followint steps:
- Step
- Represents an action, observation and reward tuple
(a_t, o_t+1, r_t)
with some additional information. - Trainer
- Manages training loop and related objects.
- Trainer
Config - Configuration of
Trainer
.
Traits§
- Act
- A set of actions of the environment.
- Agent
- Represents a trainable policy on an environment.
- Configurable
- A configurable object, having type parameter.
- Env
- Represents an environment, typically an MDP.
- Evaluator
- Evaluate
Policy
. - Experience
Buffer Base - Interface of buffers of experiences from environments.
- Info
- Additional information to
Obs
andAct
. - Obs
- A set of observations of an environment.
- Policy
- A policy on an environment.
- Replay
Buffer Base - Interface of replay buffers.
- Step
Processor - Process
Step
and output an itemSelf::Output
. - Transition
Batch - A batch of transitions for training agents.