pub trait Env {
type Config: Clone;
type Obs: Obs;
type Act: Act;
type Info: Info;
// Required methods
fn build(config: &Self::Config, seed: i64) -> Result<Self>
where Self: Sized;
fn step(&mut self, a: &Self::Act) -> (Step<Self>, Record)
where Self: Sized;
fn reset(&mut self, is_done: Option<&Vec<i8>>) -> Result<Self::Obs>;
fn reset_with_index(&mut self, ix: usize) -> Result<Self::Obs>;
// Provided method
fn step_with_reset(&mut self, a: &Self::Act) -> (Step<Self>, Record)
where Self: Sized { ... }
}
Expand description
Environment interface for reinforcement learning.
Required Associated Types§
Sourcetype Config: Clone
type Config: Clone
Configuration parameters for the environment.
This type should contain all necessary parameters to build and configure the environment, such as environment-specific settings, rendering options, or difficulty levels.
Sourcetype Obs: Obs
type Obs: Obs
The type of observations returned by the environment.
Observations represent the state of the environment as perceived by the agent.
This type must implement the Obs
trait.
Required Methods§
Sourcefn step(&mut self, a: &Self::Act) -> (Step<Self>, Record)where
Self: Sized,
fn step(&mut self, a: &Self::Act) -> (Step<Self>, Record)where
Self: Sized,
Sourcefn reset(&mut self, is_done: Option<&Vec<i8>>) -> Result<Self::Obs>
fn reset(&mut self, is_done: Option<&Vec<i8>>) -> Result<Self::Obs>
Resets the environment to its initial state.
This method resets the environment when:
is_done
isNone
(initial reset)is_done[0] == 1
(episode termination)
§Arguments
is_done
- Optional vector indicating which environments to reset
§Note
While the interface supports vectorized environments through is_done
,
the current implementation only supports single environments.
Therefore, is_done.len()
is expected to be 1.
Sourcefn reset_with_index(&mut self, ix: usize) -> Result<Self::Obs>
fn reset_with_index(&mut self, ix: usize) -> Result<Self::Obs>
Resets the environment with a specific index.
This method is primarily used during evaluation to control the initial state of the environment. The index can be used in various ways, such as:
- As a random seed for deterministic initialization
- To select specific starting conditions
- To control the difficulty level
§Arguments
ix
- An index used to control the reset behavior
§Note
This method is called by the Trainer
during evaluation to ensure
consistent testing conditions.
Provided Methods§
Sourcefn step_with_reset(&mut self, a: &Self::Act) -> (Step<Self>, Record)where
Self: Sized,
fn step_with_reset(&mut self, a: &Self::Act) -> (Step<Self>, Record)where
Self: Sized,
Performs a step and automatically resets the environment if the episode ends.
This is a convenience method that combines step
and reset
operations.
If the step results in episode termination, the environment is automatically
reset and the initial observation is included in the returned step.
§Arguments
a
- The action to apply to the environment
§Returns
A tuple containing: