Function border_core::core::util::eval[−][src]

pub fn eval<E: Env, P: Policy<E>>(
    env: &mut E, 
    policy: &mut P, 
    n_episodes: usize
) -> Vec<f32>

Expand description

Run episodes with a policy and return cumlative rewards.

This function assumes that the environment is non-vectorized or n_proc=1.

In this function, the main entities of the library, i.e., an environment (super::Env), observation (super::Obs), action (super::Act) and policy (super::Policy), are interacts as illustrated in the following diagram:

graph LR Env --> Obs Obs --> Policy Policy --> Act Act --> Env

By definition of the environment, observations and actions can be modified. The constructor of [crate::env::py_gym_env::PyGymEnv] accepts [crate::env::py_gym_env::PyGymEnvObsFilter] and [crate::env::py_gym_env::PyGymEnvActFilter] for the purpose. In this case, the interaction of the entities is shown as below (PyGymEnvAct is for discrete or continuous actions in reality):

graph LR PyGymEnvObsFilter --> PyGymEnvObs PyGymEnvObs --> Policy Policy --> PyGymEnvAct PyGymEnvAct --> PyGymEnvActFilter subgraph PyGymEnv PyGymEnvActFilter --> Py(Python runtime) Py(Python runtime) --> PyGymEnvObsFilter end