Struct border_core::core::trainer::Trainer [−][src]
Expand description
Manages training process.
Training loop
For training an agent with standard RL algorithms in the library, the agent and environment interact as illustrated in the following diagram:
- Call
Env::resetfor resetting the enbironment and getting an observation. An episode starts. - Call
std::cell::RefCell::replacefor placing the observation inPrevObs. - Call
Agent::push_obsfor placing the observation inPrevObs'. - Call
Policy::samplefor sampling an action fromPolicy. - Call
Env::stepfor taking an action, getting a new observation, and creatingStepobject. - Call
Agent::observefor updating the replay buffer with the new and previous observations. - Call some methods in the agent for updating policy parameters.
- Back to 1.
Actually, Trainer is not responsible for the step 6. The Agent does it.
Model evaluation and saving
Trainer::train() evaluates the agent being trained with the interval of optimization steps specified by TrainerBuilder::eval_interval(). If the evaluation reward is greater than the maximum in the history of training, the agent will be saved in the directory specified by TrainerBuilder::model_dir().
A trained agent often consists of a number of neural networks like an action-value network, its target network, a policy network. Typically, Agent saves all of these neural networks in a directory.
Implementations
Get the reference to the environment for evaluation.
Train the agent.
In the training loop, the following values are recorded in the recorder:
n_steps- The nunber of steps interacting with the environment.n_opts- The number of optimization steps.datetime-Date and time.mean_cum_eval_reward- Cumulative rewards in evaluation runs.
Auto Trait Implementations
impl<E, A> !RefUnwindSafe for Trainer<E, A>impl<E, A> UnwindSafe for Trainer<E, A> where
A: UnwindSafe,
E: UnwindSafe,
<E as Env>::Obs: UnwindSafe,