pub struct MetaEnv<E> {
    pub env_distribution: E,
}
Expand description

A meta reinforcement learning environment that treats RL itself as an environment.

An episode in this meta environment is called a “Trial” and consists of several episodes from the inner environment. A new inner environment with a different structure seed is sampled for each Trial. A meta episode ends when a fixed number of inner episodes have been completed.

The step metadata from the inner environment are embedded as observations.

Observations

A MetaObservation. Consists of the inner observation, the previous step action and feedback, and whether the inner episode is done.

Actions

The action space is the same as the action space of the inner environments. Actions are forwarded to the inner environment except when the current state is the last state of the inner episode (episode_done == true). In that case, the provided action is ignored and the next state will be the start of a new inner episode.

Feedback

The inner environment feedback must implement MetaFeedback (and MetaFeedbackSpace for the space). This feedback is decomposed into separate inner and outer feedback. In the case of Reward feedback, the same value is used as both the inner and outer feedback.

States

The state (MetaState) consists of an inner environment instance, an inner environment state, an episode index within the trial, and details of the most recent inner step within the episode.

Reference

This meta environment design is roughly consistent with the structure used in the paper “RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning” by Duan et al.

Fields

env_distribution: E

Environment distribution from which each trial’s episode is sampled.

Implementations

View the structure of the inner environment.

Trait Implementations

Environment observation type.

Environment action type.

Environment feedback type.

Environment observation space type.

Environment action space type.

Environment feedback space type.

Type of environment to build

Build an environment instance. Read more

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Returns the “default value” for a type. Read more

Deserialize this value from the given Serde deserializer. Read more

Space containing all possible observations. Read more

The space of all possible actions. Read more

The space of all possible feedback. Read more

A discount factor applied to future feedback. Read more

Environment state type. Not necessarily observable by the agent.

Observation of the state provided to the agent.

Action selected by the agent.

Feedback provided to a learning agent as the result of each step. Reward, for example. Read more

Sample a state for the start of a new episode. Read more

Generate an observation for a given state.

Perform a state transition in reponse to an action. Read more

Run this environment with the given actor.

Feeds this value into the given Hasher. Read more

Feeds a slice of this type into the given Hasher. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Convert into an Any trait reference.

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Compare self to key and return true if they are equal.

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

Should always be Self

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.