Struct Trainer

Source
pub struct Trainer { /* private fields */ }
Expand description

Manages the training loop and coordinates interactions between components.

The Trainer orchestrates the training process by managing:

  • Environment interactions and experience collection
  • Agent optimization and parameter updates
  • Performance evaluation and model saving
  • Training metrics recording

§Training Process

The training loop follows these steps:

  1. Initialize training components:

    • Reset environment step counter (env_steps = 0)
    • Reset optimization step counter (opt_steps = 0)
    • Initialize performance monitoring
  2. Environment Interaction:

    • Agent observes environment state
    • Agent selects and executes action
    • Environment transitions to new state
    • Experience is collected and stored
  3. Optimization:

    • At specified intervals (opt_interval):
      • Sample experiences from replay buffer
      • Update agent parameters
      • Track optimization performance
  4. Evaluation and Recording:

    • Periodically evaluate agent performance
    • Record training metrics
    • Save model checkpoints
    • Monitor optimization speed

§Model Selection

During training, the best performing model is automatically saved based on evaluation rewards:

  • At each evaluation interval (eval_interval), the agent’s performance is evaluated
  • The evaluation reward is obtained from Record::get_scalar_without_key()
  • If the current evaluation reward exceeds the previous maximum reward:
    • The model is saved as the “best” model
    • The maximum reward is updated
  • This ensures that the saved “best” model represents the agent’s peak performance

§Configuration

Training behavior is controlled by various intervals and parameters:

  • opt_interval: Steps between optimization updates
  • eval_interval: Steps between performance evaluations
  • save_interval: Steps between model checkpoints
  • warmup_period: Initial steps before optimization begins
  • max_opts: Maximum number of optimization steps

Implementations§

Source§

impl Trainer

Source

pub fn build(config: TrainerConfig) -> Self

Creates a new trainer with the specified configuration.

§Arguments
  • config - Configuration parameters for the trainer
§Returns

A new Trainer instance with the specified configuration

Source

pub fn train_step<E, R>( &mut self, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, ) -> Result<(Record, bool)>
where E: Env, R: ReplayBufferBase,

Performs a single training step.

This method:

  1. Performs an environment step
  2. Collects and stores the experience
  3. Optionally performs an optimization step
§Arguments
  • agent - The agent being trained
  • buffer - The replay buffer storing experiences
§Returns

A tuple containing:

  • A record of the training step
  • A boolean indicating if an optimization step was performed
§Errors

Returns an error if the optimization step fails

Source

pub fn train<E, P, R, D>( &mut self, env: E, step_proc: P, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, recorder: &mut Box<dyn Recorder<E, R>>, evaluator: &mut D, ) -> Result<()>
where E: Env, P: StepProcessor<E>, R: ExperienceBufferBase<Item = P::Output> + ReplayBufferBase, D: Evaluator<E>,

Train the agent online.

Source

pub fn train_offline<E, R, D>( &mut self, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, recorder: &mut Box<dyn Recorder<E, R>>, evaluator: &mut D, ) -> Result<()>
where E: Env, R: ReplayBufferBase, D: Evaluator<E>,

Train the agent offline.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V