Trait relearn::envs::Pomdp [−][src]
pub trait Pomdp {
type State;
type Observation;
type Action;
fn initial_state(&self, rng: &mut StdRng) -> Self::State;
fn observe(
&self,
state: &Self::State,
rng: &mut StdRng
) -> Self::Observation;
fn step(
&self,
state: Self::State,
action: &Self::Action,
rng: &mut StdRng
) -> (Option<Self::State>, f64, bool);
}
Expand description
A partially observable Markov decision process (POMDP).
The concept of an episode is an abstraction on the MDP formalism. An episode ending means that all possible future trajectories have 0 reward on each step.
Associated Types
Required methods
fn initial_state(&self, rng: &mut StdRng) -> Self::State
fn initial_state(&self, rng: &mut StdRng) -> Self::State
Sample a new initial state.
fn observe(&self, state: &Self::State, rng: &mut StdRng) -> Self::Observation
fn observe(&self, state: &Self::State, rng: &mut StdRng) -> Self::Observation
Sample an observation for a state.
Sample a state transition.
Returns
state
: The resulting state. IsNone
if the resulting state is terminal. All trajectories from terminal states yield 0 reward on each step.reward
: The reward value for this transition.episode_done
: Whether this step ends the episode.- If
observation
isNone
thenepisode_done
must be true. - An episode may be done for other reasons, like a step limit.
- If