Trait Policy

Source

pub trait Policy<E: Env> {
    // Required method
    fn sample(&mut self, obs: &E::Obs) -> E::Act;
}

Expand description

A policy that maps observations to actions in a reinforcement learning environment.

This trait defines the interface for policies, which are the core decision-making components in reinforcement learning. A policy can be:

Deterministic: Always returns the same action for a given observation
Stochastic: Returns actions sampled from a probability distribution

§Type Parameters

E - The environment type that this policy operates on

§Examples

A simple deterministic policy might look like:

struct SimplePolicy;

impl<E: Env> Policy<E> for SimplePolicy {
    fn sample(&mut self, obs: &E::Obs) -> E::Act {
        // Always return the same action for a given observation
        E::Act::default()
    }
}

A stochastic policy might look like:

struct StochasticPolicy;

impl<E: Env> Policy<E> for StochasticPolicy {
    fn sample(&mut self, obs: &E::Obs) -> E::Act {
        // Sample an action from a probability distribution
        // based on the observation
        E::Act::random()
    }
}

Required Methods§

Source

fn sample(&mut self, obs: &E::Obs) -> E::Act

Samples an action given an observation from the environment.

This method is the core of the policy interface, defining how the policy makes decisions based on the current state of the environment.

§Arguments

obs - The current observation from the environment

§Returns

An action to be taken in the environment

Implementors§

Source §

Trait PolicyCopy item path