Trait relearn::envs::Mdp[][src]

pub trait Mdp {
    type State;
    type Action;
    fn initial_state(&self, rng: &mut StdRng) -> Self::State;
fn step(
        &self,
        state: Self::State,
        action: &Self::Action,
        rng: &mut StdRng
    ) -> (Option<Self::State>, f64, bool); }
Expand description

A Markov decision process (MDP).

The concept of an episode is an abstraction on the MDP formalism. An episode ending means that all possible future trajectories have 0 reward on each step.

Associated Types

Required methods

Sample a new initial state.

Sample a state transition.

Returns
  • state: The resulting state. Is None if the resulting state is terminal. All trajectories from terminal states yield 0 reward on each step.
  • reward: The reward value for this transition.
  • episode_done: Whether this step ends the current episode.
    • If observation is None then episode_done must be true.
    • An episode may be done for other reasons, like a step limit.

Implementors