pub trait Policy<E: Env> {
// Required method
fn sample(&mut self, obs: &E::Obs) -> E::Act;
}
Expand description
A policy that maps observations to actions in a reinforcement learning environment.
This trait defines the interface for policies, which are the core decision-making components in reinforcement learning. A policy can be:
- Deterministic: Always returns the same action for a given observation
- Stochastic: Returns actions sampled from a probability distribution
§Type Parameters
E
- The environment type that this policy operates on
§Examples
A simple deterministic policy might look like:
ⓘ
struct SimplePolicy;
impl<E: Env> Policy<E> for SimplePolicy {
fn sample(&mut self, obs: &E::Obs) -> E::Act {
// Always return the same action for a given observation
E::Act::default()
}
}
A stochastic policy might look like:
ⓘ
struct StochasticPolicy;
impl<E: Env> Policy<E> for StochasticPolicy {
fn sample(&mut self, obs: &E::Obs) -> E::Act {
// Sample an action from a probability distribution
// based on the observation
E::Act::random()
}
}
Required Methods§
Sourcefn sample(&mut self, obs: &E::Obs) -> E::Act
fn sample(&mut self, obs: &E::Obs) -> E::Act
Samples an action given an observation from the environment.
This method is the core of the policy interface, defining how the policy makes decisions based on the current state of the environment.
§Arguments
obs
- The current observation from the environment
§Returns
An action to be taken in the environment