Struct border_core::core::trainer::Trainer[][src]

pub struct Trainer<E: Env, A: Agent<E>> { /* fields omitted */ }
Expand description

Manages training process.

Training loop

For training an agent with standard RL algorithms in the library, the agent and environment interact as illustrated in the following diagram:

flowchart TB Trainer -. 0. Env::reset .-> Env Env --> Obs ObsPrev -- 3. Policy::sample --> Policy Policy --> Act Act -- 4. Env::step --> Env Obs --> Step Obs -- 1. RefCell::replace --> ObsPrev Act --> Step ObsPrev -- 2. Agent::push_obs --> ObsPrev' Step -- 5. Agent::observe --> Transition subgraph Agent ObsPrev' --> Transition ReplayBuffer -- 6. update policy parameters --- Policy Transition --> ReplayBuffer end
  1. Call Env::reset for resetting the enbironment and getting an observation. An episode starts.
  2. Call std::cell::RefCell::replace for placing the observation in PrevObs.
  3. Call Agent::push_obs for placing the observation in PrevObs'.
  4. Call Policy::sample for sampling an action from Policy.
  5. Call Env::step for taking an action, getting a new observation, and creating Step object.
  6. Call Agent::observe for updating the replay buffer with the new and previous observations.
  7. Call some methods in the agent for updating policy parameters.
  8. Back to 1.

Actually, Trainer is not responsible for the step 6. The Agent does it.

Model evaluation and saving

Trainer::train() evaluates the agent being trained with the interval of optimization steps specified by TrainerBuilder::eval_interval(). If the evaluation reward is greater than the maximum in the history of training, the agent will be saved in the directory specified by TrainerBuilder::model_dir().

A trained agent often consists of a number of neural networks like an action-value network, its target network, a policy network. Typically, Agent saves all of these neural networks in a directory.

Implementations

Get the reference to the agent.

Get the reference to the environment.

Get the reference to the environment for evaluation.

Train the agent.

In the training loop, the following values are recorded in the recorder:

  • n_steps - The nunber of steps interacting with the environment.
  • n_opts - The number of optimization steps.
  • datetime - Date and time.
  • mean_cum_eval_reward - Cumulative rewards in evaluation runs.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.