Struct border_core::core::trainer::Trainer [−][src]
Expand description
Manages training process.
Training loop
For training an agent with standard RL algorithms in the library, the agent and environment interact as illustrated in the following diagram:
- Call
Env::reset
for resetting the enbironment and getting an observation. An episode starts. - Call
std::cell::RefCell::replace
for placing the observation inPrevObs
. - Call
Agent::push_obs
for placing the observation inPrevObs'
. - Call
Policy::sample
for sampling an action fromPolicy
. - Call
Env::step
for taking an action, getting a new observation, and creatingStep
object. - Call
Agent::observe
for updating the replay buffer with the new and previous observations. - Call some methods in the agent for updating policy parameters.
- Back to 1.
Actually, Trainer
is not responsible for the step 6. The Agent
does it.
Model evaluation and saving
Trainer::train() evaluates the agent being trained with the interval of optimization steps specified by TrainerBuilder::eval_interval(). If the evaluation reward is greater than the maximum in the history of training, the agent will be saved in the directory specified by TrainerBuilder::model_dir().
A trained agent often consists of a number of neural networks like an action-value network, its target network, a policy network. Typically, Agent saves all of these neural networks in a directory.
Implementations
impl<E: Env, A: Agent<E>> Trainer<E, A>
[src]
impl<E: Env, A: Agent<E>> Trainer<E, A>
[src]pub fn get_env_eval(&self) -> &E
[src]
pub fn get_env_eval(&self) -> &E
[src]Get the reference to the environment for evaluation.
pub fn train<T: Recorder>(&mut self, recorder: &mut T)
[src]
pub fn train<T: Recorder>(&mut self, recorder: &mut T)
[src]Train the agent.
In the training loop, the following values are recorded in the recorder
:
n_steps
- The nunber of steps interacting with the environment.n_opts
- The number of optimization steps.datetime
-Date and time
.mean_cum_eval_reward
- Cumulative rewards in evaluation runs.
Auto Trait Implementations
impl<E, A> !RefUnwindSafe for Trainer<E, A>
impl<E, A> Send for Trainer<E, A> where
A: Send,
E: Send,
<E as Env>::Obs: Send,
A: Send,
E: Send,
<E as Env>::Obs: Send,
impl<E, A> !Sync for Trainer<E, A>
impl<E, A> Unpin for Trainer<E, A> where
A: Unpin,
E: Unpin,
<E as Env>::Obs: Unpin,
A: Unpin,
E: Unpin,
<E as Env>::Obs: Unpin,
impl<E, A> UnwindSafe for Trainer<E, A> where
A: UnwindSafe,
E: UnwindSafe,
<E as Env>::Obs: UnwindSafe,
A: UnwindSafe,
E: UnwindSafe,
<E as Env>::Obs: UnwindSafe,
Blanket Implementations
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]pub fn borrow_mut(&mut self) -> &mut T
[src]
pub fn borrow_mut(&mut self) -> &mut T
[src]Mutably borrows from an owned value. Read more
impl<T> Pointable for T
impl<T> Pointable for T