Trait relearn::envs::Environment
source · [−]pub trait Environment {
type State;
type Observation;
type Action;
type Feedback;
fn initial_state(&self, rng: &mut Prng) -> Self::State;
fn observe(&self, state: &Self::State, rng: &mut Prng) -> Self::Observation;
fn step(
&self,
state: Self::State,
action: &Self::Action,
rng: &mut Prng,
logger: &mut dyn StatsLogger
) -> (Successor<Self::State>, Self::Feedback);
fn run<T, L>(
self,
actor: T,
seed: SimSeed,
logger: L
) -> Steps<Self, T, Prng, L>ⓘNotable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
where
T: Actor<Self::Observation, Self::Action>,
L: StatsLogger,
Self: Sized,
{ ... }
}
Expand description
A reinforcement learning environment.
Formally, this is a Partially Observable Markov Decision Process (POMDP) but with arbitrary
feedback instead of just reward values, and with episodes.
An episode is a sequence of environment steps starting with Environment::initial_state
and ending when Environment::step
returns either
Successor::Terminate
meaning all possible future rewards are zero; orSuccessor::Interrupt
meaning the POMDP would continue with possible nonzero reward but but has been prematurely interrupted.
This trait encodes the dynamics of a reinforcement learning environment.
The actual state is represented by the State
associated type.
Design Discussion
State
The use of an explicit State
associated type allows the type system to manage episode
lifetimes; there is no possibility of an incomplete reset between episodes.
However, it forces the users of this trait to handle State
when they might prefer it to be
a hidden internal implementation detail.
Once Generic Associated Types are stable, an alternative Environment
trait could
have an Episode<'a>
associated type where Episode
provides a step
method and
internally manages state.
However, using the generic Episode<'a>
approach would make it difficult to store an
environment and an episode together.
Something similar could be done without GAT using an
Episode<'a, E: Environment>(&'a E, E::State)
struct with the same drawbacks.
Random State
The episode is not responsible for managing its own pseudo-random state. This avoids having to frequently re-initialize the random number generator on each episode and simplifies state definitions.
Required Associated Types
type Observation
type Observation
Observation of the state provided to the agent.
Feedback provided to a learning agent as the result of each step. Reward
, for example.
This is distinguished from observation
in that it is only part of the training or
evaluation process. Unless an agent is explicitly updated within an episode,
its actions are not able to depend on the feedback
of previous steps.
Required Methods
fn initial_state(&self, rng: &mut Prng) -> Self::State
fn initial_state(&self, rng: &mut Prng) -> Self::State
Sample a state for the start of a new episode.
rng
is a source of randomness for sampling the initial state.
This includes seeding any pseudo-random number generators used by the environment, which
must be stored within State
.
fn observe(&self, state: &Self::State, rng: &mut Prng) -> Self::Observation
fn observe(&self, state: &Self::State, rng: &mut Prng) -> Self::Observation
Generate an observation for a given state.
Provided Methods
fn run<T, L>(self, actor: T, seed: SimSeed, logger: L) -> Steps<Self, T, Prng, L>ⓘNotable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
where
T: Actor<Self::Observation, Self::Action>,
L: StatsLogger,
Self: Sized,
fn run<T, L>(self, actor: T, seed: SimSeed, logger: L) -> Steps<Self, T, Prng, L>ⓘNotable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
where
T: Actor<Self::Observation, Self::Action>,
L: StatsLogger,
Self: Sized,
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
Run this environment with the given actor.