pub trait Environment {
    type State;
    type Observation;
    type Action;
    type Feedback;

    fn initial_state(&self, rng: &mut Prng) -> Self::State;
    fn observe(&self, state: &Self::State, rng: &mut Prng) -> Self::Observation;
    fn step(
        &self,
        state: Self::State,
        action: &Self::Action,
        rng: &mut Prng,
        logger: &mut dyn StatsLogger
    ) -> (Successor<Self::State>, Self::Feedback); fn run<T, L>(
        self,
        actor: T,
        seed: SimSeed,
        logger: L
    ) -> Steps<Self, T, Prng, L>Notable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
    E: Environment,
    T: Actor<E::Observation, E::Action>,
    R: BorrowMut<Prng>,
    L: StatsLogger
type Item = PartialStep<E::Observation, E::Action, E::Feedback>;

    where
        T: Actor<Self::Observation, Self::Action>,
        L: StatsLogger,
        Self: Sized
, { ... } }
Expand description

A reinforcement learning environment.

Formally, this is a Partially Observable Markov Decision Process (POMDP) but with arbitrary feedback instead of just reward values, and with episodes. An episode is a sequence of environment steps starting with Environment::initial_state and ending when Environment::step returns either

This trait encodes the dynamics of a reinforcement learning environment. The actual state is represented by the State associated type.

Design Discussion

State

The use of an explicit State associated type allows the type system to manage episode lifetimes; there is no possibility of an incomplete reset between episodes. However, it forces the users of this trait to handle State when they might prefer it to be a hidden internal implementation detail. Once Generic Associated Types are stable, an alternative Environment trait could have an Episode<'a> associated type where Episode provides a step method and internally manages state. However, using the generic Episode<'a> approach would make it difficult to store an environment and an episode together. Something similar could be done without GAT using an Episode<'a, E: Environment>(&'a E, E::State) struct with the same drawbacks.

Random State

The episode is not responsible for managing its own pseudo-random state. This avoids having to frequently re-initialize the random number generator on each episode and simplifies state definitions.

Required Associated Types

Environment state type. Not necessarily observable by the agent.

Observation of the state provided to the agent.

Action selected by the agent.

Feedback provided to a learning agent as the result of each step. Reward, for example.

This is distinguished from observation in that it is only part of the training or evaluation process. Unless an agent is explicitly updated within an episode, its actions are not able to depend on the feedback of previous steps.

Required Methods

Sample a state for the start of a new episode.

rng is a source of randomness for sampling the initial state. This includes seeding any pseudo-random number generators used by the environment, which must be stored within State.

Generate an observation for a given state.

Perform a state transition in reponse to an action.

Args
  • state - The initial state.
  • action - The action to take at this state.
  • logger - Logger for any auxiliary information.
Returns
  • successor - The resulting state or episode outcome.
  • feedback - Feedback to the agent learning process.

Provided Methods

Run this environment with the given actor.

Implementations on Foreign Types

Implementors