Expand description
Chain Environment
Consists of n states in a line with 2 actions.
- Action 0 moves back to the start for 2 reward.
- Action 1 moves forward for 0 reward in all states but the last. In the last state, taking action 1 is a self-transition with 10 reward.
- Every action has a 0.2 chance of “slipping” and taking the opposite action.
Described in “Bayesian Q-learning” by Dearden, Friedman and Russel (1998)
Fields
size: usize
discount_factor: f64
Implementations
Trait Implementations
sourceimpl<'de> Deserialize<'de> for Chain
impl<'de> Deserialize<'de> for Chain
sourcefn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error> where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error> where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
sourceimpl EnvStructure for Chain
impl EnvStructure for Chain
type ObservationSpace = IndexSpace
type ActionSpace = IndexedTypeSpace<Move>
type FeedbackSpace = IntervalSpace<Reward>
sourcefn observation_space(&self) -> Self::ObservationSpace
fn observation_space(&self) -> Self::ObservationSpace
Space containing all possible observations. Read more
sourcefn action_space(&self) -> Self::ActionSpace
fn action_space(&self) -> Self::ActionSpace
The space of all possible actions. Read more
sourcefn feedback_space(&self) -> Self::FeedbackSpace
fn feedback_space(&self) -> Self::FeedbackSpace
The space of all possible feedback. Read more
sourcefn discount_factor(&self) -> f64
fn discount_factor(&self) -> f64
A discount factor applied to future feedback. Read more
sourceimpl Environment for Chain
impl Environment for Chain
type Observation = usize
type Observation = usize
Observation of the state provided to the agent.
type Action = Move
type Action = Move
Action selected by the agent.
sourcefn initial_state(&self, _: &mut Prng) -> Self::State
fn initial_state(&self, _: &mut Prng) -> Self::State
Sample a state for the start of a new episode. Read more
sourcefn observe(&self, state: &Self::State, _: &mut Prng) -> Self::Observation
fn observe(&self, state: &Self::State, _: &mut Prng) -> Self::Observation
Generate an observation for a given state.
sourcefn step(
&self,
state: Self::State,
action: &Self::Action,
rng: &mut Prng,
_: &mut dyn StatsLogger
) -> (Successor<Self::State>, Self::Feedback)
fn step(
&self,
state: Self::State,
action: &Self::Action,
rng: &mut Prng,
_: &mut dyn StatsLogger
) -> (Successor<Self::State>, Self::Feedback)
Perform a state transition in reponse to an action. Read more
sourcefn run<T, L>(self, actor: T, seed: SimSeed, logger: L) -> Steps<Self, T, Prng, L>ⓘNotable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
where
T: Actor<Self::Observation, Self::Action>,
L: StatsLogger,
Self: Sized,
fn run<T, L>(self, actor: T, seed: SimSeed, logger: L) -> Steps<Self, T, Prng, L>ⓘNotable traits for Steps<E, T, R, L>impl<E, T, R, L> Iterator for Steps<E, T, R, L> where
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
where
T: Actor<Self::Observation, Self::Action>,
L: StatsLogger,
Self: Sized,
E: Environment,
T: Actor<E::Observation, E::Action>,
R: BorrowMut<Prng>,
L: StatsLogger, type Item = PartialStep<E::Observation, E::Action, E::Feedback>;
Run this environment with the given actor.
impl CloneBuild for Chain
impl Copy for Chain
impl StructuralPartialEq for Chain
Auto Trait Implementations
impl RefUnwindSafe for Chain
impl Send for Chain
impl Sync for Chain
impl Unpin for Chain
impl UnwindSafe for Chain
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more