Trait relearn::envs::Pomdp[][src]

pub trait Pomdp {
    type State;
    type Observation;
    type Action;
    fn initial_state(&self, rng: &mut StdRng) -> Self::State;
fn observe(
        &self,
        state: &Self::State,
        rng: &mut StdRng
    ) -> Self::Observation;
fn step(
        &self,
        state: Self::State,
        action: &Self::Action,
        rng: &mut StdRng
    ) -> (Option<Self::State>, f64, bool); }
Expand description

A partially observable Markov decision process (POMDP).

The concept of an episode is an abstraction on the MDP formalism. An episode ending means that all possible future trajectories have 0 reward on each step.

Associated Types

Required methods

Sample a new initial state.

Sample an observation for a state.

Sample a state transition.

Returns
  • state: The resulting state. Is None if the resulting state is terminal. All trajectories from terminal states yield 0 reward on each step.
  • reward: The reward value for this transition.
  • episode_done: Whether this step ends the episode.
    • If observation is None then episode_done must be true.
    • An episode may be done for other reasons, like a step limit.

Implementors