Function border_core::core::util::eval [−][src]
pub fn eval<E: Env, P: Policy<E>>(
env: &mut E,
policy: &mut P,
n_episodes: usize
) -> Vec<f32>
Expand description
Run episodes with a policy and return cumlative rewards.
This function assumes that the environment is non-vectorized or n_proc
=1.
In this function, the main entities of the library, i.e., an environment (super::Env
),
observation (super::Obs
), action (super::Act
) and policy (super::Policy
),
are interacts as illustrated in the following diagram:
graph LR
Env --> Obs
Obs --> Policy
Policy --> Act
Act --> Env
By definition of the environment, observations and actions can be modified.
The constructor of [crate::env::py_gym_env::PyGymEnv
] accepts
[crate::env::py_gym_env::PyGymEnvObsFilter
] and
[crate::env::py_gym_env::PyGymEnvActFilter
] for the purpose.
In this case, the interaction of the entities is shown as below
(PyGymEnvAct
is for discrete or continuous actions in reality):
graph LR
PyGymEnvObsFilter --> PyGymEnvObs
PyGymEnvObs --> Policy
Policy --> PyGymEnvAct
PyGymEnvAct --> PyGymEnvActFilter
subgraph PyGymEnv
PyGymEnvActFilter --> Py(Python runtime)
Py(Python runtime) --> PyGymEnvObsFilter
end