Struct Trainer

pub struct Trainer { /* private fields */ }

Expand description

Manages the training loop and coordinates interactions between components.

The Trainer orchestrates the training process by managing:

Environment interactions and experience collection
Agent optimization and parameter updates
Performance evaluation and model saving
Training metrics recording

§Training Process

The training loop follows these steps:

Initialize training components:
- Reset environment step counter (env_steps = 0)
- Reset optimization step counter (opt_steps = 0)
- Initialize performance monitoring
Environment Interaction:
- Agent observes environment state
- Agent selects and executes action
- Environment transitions to new state
- Experience is collected and stored
Optimization:
- At specified intervals (opt_interval):
  - Sample experiences from replay buffer
  - Update agent parameters
  - Track optimization performance
Evaluation and Recording:
- Periodically evaluate agent performance
- Record training metrics
- Save model checkpoints
- Monitor optimization speed

§Model Selection

During training, the best performing model is automatically saved based on evaluation rewards:

At each evaluation interval (eval_interval), the agent’s performance is evaluated
The evaluation reward is obtained from Record::get_scalar_without_key()
If the current evaluation reward exceeds the previous maximum reward:
- The model is saved as the “best” model
- The maximum reward is updated
This ensures that the saved “best” model represents the agent’s peak performance

§Configuration

Training behavior is controlled by various intervals and parameters:

opt_interval: Steps between optimization updates
eval_interval: Steps between performance evaluations
save_interval: Steps between model checkpoints
warmup_period: Initial steps before optimization begins
max_opts: Maximum number of optimization steps

Implementations§

impl Trainer

pub fn build(config: TrainerConfig) -> Self

Creates a new trainer with the specified configuration.

§Arguments

config - Configuration parameters for the trainer

§Returns

A new Trainer instance with the specified configuration

pub fn train_step<E, R>( &mut self, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, ) -> Result<(Record, bool)>
where E: Env, R: ReplayBufferBase,

Performs a single training step.

This method:

Performs an environment step
Collects and stores the experience
Optionally performs an optimization step

§Arguments

agent - The agent being trained
buffer - The replay buffer storing experiences

§Returns

A tuple containing:

A record of the training step
A boolean indicating if an optimization step was performed

§Errors

Returns an error if the optimization step fails

pub fn train<E, P, R, D>( &mut self, env: E, step_proc: P, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, recorder: &mut Box<dyn Recorder<E, R>>, evaluator: &mut D, ) -> Result<()>
where E: Env, P: StepProcessor<E>, R: ExperienceBufferBase<Item = P::Output> + ReplayBufferBase, D: Evaluator<E>,

Train the agent online.

pub fn train_offline<E, R, D>( &mut self, agent: &mut Box<dyn Agent<E, R>>, buffer: &mut R, recorder: &mut Box<dyn Recorder<E, R>>, evaluator: &mut D, ) -> Result<()>
where E: Env, R: ReplayBufferBase, D: Evaluator<E>,

Train the agent offline.

Auto Trait Implementations§

impl Freeze for Trainer

impl RefUnwindSafe for Trainer

impl Send for Trainer

impl Sync for Trainer

impl Unpin for Trainer

impl UnwindSafe for Trainer

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V