Struct relearn::envs::UniformBernoulliBandits [−][src]
pub struct UniformBernoulliBandits {
pub num_arms: usize,
}
Expand description
A distribution over Beroulli bandit environments with uniformly sampled means.
The mean of each arm is sampled uniformly from [0, 1]
.
Reference
This environment distribution is used in the paper “RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning” by Duan et al.
Fields
num_arms: usize
Number of bandit arms.
Implementations
Trait Implementations
type ObservationSpace = SingletonSpace
type ActionSpace = IndexSpace
Space containing all possible observations. Read more
The space of all possible actions. Read more
A lower and upper bound on possible reward values. Read more
A discount factor applied to future rewards. Read more
This method tests for self
and other
values to be equal, and is used
by ==
. Read more
This method tests for !=
.
type Pomdp = BernoulliBandit
Sample a POMDP from the distribution. Read more
Auto Trait Implementations
impl RefUnwindSafe for UniformBernoulliBandits
impl Send for UniformBernoulliBandits
impl Sync for UniformBernoulliBandits
impl Unpin for UniformBernoulliBandits
impl UnwindSafe for UniformBernoulliBandits
Blanket Implementations
Mutably borrows from an owned value. Read more
type Environment = PomdpEnv<<T as PomdpDistribution>::Pomdp>
Sample an environment from the distribution. Read more
Compare self to key
and return true
if they are equal.
pub fn vzip(self) -> V
Apply an update from the given source value.