pub struct Trainer { /* private fields */ }
Expand description
Manages the training loop and coordinates interactions between components.
The Trainer
orchestrates the training process by managing:
- Environment interactions and experience collection
- Agent optimization and parameter updates
- Performance evaluation and model saving
- Training metrics recording
§Training Process
The training loop follows these steps:
-
Initialize training components:
- Reset environment step counter (
env_steps = 0
) - Reset optimization step counter (
opt_steps = 0
) - Initialize performance monitoring
- Reset environment step counter (
-
Environment Interaction:
- Agent observes environment state
- Agent selects and executes action
- Environment transitions to new state
- Experience is collected and stored
-
Optimization:
- At specified intervals (
opt_interval
):- Sample experiences from replay buffer
- Update agent parameters
- Track optimization performance
- At specified intervals (
-
Evaluation and Recording:
- Periodically evaluate agent performance
- Record training metrics
- Save model checkpoints
- Monitor optimization speed
§Model Selection
During training, the best performing model is automatically saved based on evaluation rewards:
- At each evaluation interval (
eval_interval
), the agent’s performance is evaluated - The evaluation reward is obtained from
Record::get_scalar_without_key()
- If the current evaluation reward exceeds the previous maximum reward:
- The model is saved as the “best” model
- The maximum reward is updated
- This ensures that the saved “best” model represents the agent’s peak performance
§Configuration
Training behavior is controlled by various intervals and parameters:
opt_interval
: Steps between optimization updateseval_interval
: Steps between performance evaluationssave_interval
: Steps between model checkpointswarmup_period
: Initial steps before optimization beginsmax_opts
: Maximum number of optimization steps
Implementations§
Source§impl Trainer
impl Trainer
Sourcepub fn build(config: TrainerConfig) -> Self
pub fn build(config: TrainerConfig) -> Self
Sourcepub fn train_step<E, R>(
&mut self,
agent: &mut Box<dyn Agent<E, R>>,
buffer: &mut R,
) -> Result<(Record, bool)>where
E: Env,
R: ReplayBufferBase,
pub fn train_step<E, R>(
&mut self,
agent: &mut Box<dyn Agent<E, R>>,
buffer: &mut R,
) -> Result<(Record, bool)>where
E: Env,
R: ReplayBufferBase,
Performs a single training step.
This method:
- Performs an environment step
- Collects and stores the experience
- Optionally performs an optimization step
§Arguments
agent
- The agent being trainedbuffer
- The replay buffer storing experiences
§Returns
A tuple containing:
- A record of the training step
- A boolean indicating if an optimization step was performed
§Errors
Returns an error if the optimization step fails
Sourcepub fn train<E, P, R, D>(
&mut self,
env: E,
step_proc: P,
agent: &mut Box<dyn Agent<E, R>>,
buffer: &mut R,
recorder: &mut Box<dyn Recorder<E, R>>,
evaluator: &mut D,
) -> Result<()>where
E: Env,
P: StepProcessor<E>,
R: ExperienceBufferBase<Item = P::Output> + ReplayBufferBase,
D: Evaluator<E>,
pub fn train<E, P, R, D>(
&mut self,
env: E,
step_proc: P,
agent: &mut Box<dyn Agent<E, R>>,
buffer: &mut R,
recorder: &mut Box<dyn Recorder<E, R>>,
evaluator: &mut D,
) -> Result<()>where
E: Env,
P: StepProcessor<E>,
R: ExperienceBufferBase<Item = P::Output> + ReplayBufferBase,
D: Evaluator<E>,
Train the agent online.
Auto Trait Implementations§
impl Freeze for Trainer
impl RefUnwindSafe for Trainer
impl Send for Trainer
impl Sync for Trainer
impl Unpin for Trainer
impl UnwindSafe for Trainer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more