pub struct DqnTrainer<E, Enc, Act, B, Buf = CircularBuffer<<E as Environment>::Observation, <E as Environment>::Action>>{ /* private fields */ }Expand description
The imperative training driver.
Drives the interaction between an environment and a DQN agent,
exposing it as an iterator that yields StepMetrics after every step,
as well as higher-level train() and eval() methods.
§Usage — iterator style (manual control)
let mut trainer = DqnTrainer::new(env, agent, seed);
for step in trainer.steps().take(50_000) {
if step.episode_done {
println!("Episode {} reward: {}", step.episode, step.episode_reward);
}
}§Usage — imperative style (with TrainingRun)
let run = TrainingRun::create("cartpole", "v1")?;
let mut trainer = DqnTrainer::new(env, agent, seed).with_run(run);
trainer.train(200_000);
let report = trainer.eval(20);
report.print();Implementations§
Source§impl<E, Enc, Act, B, Buf> DqnTrainer<E, Enc, Act, B, Buf>where
E: Environment,
E::Observation: Clone + Send + Sync + 'static,
E::Action: Clone + Send + Sync + 'static,
Enc: ObservationEncoder<E::Observation, B> + ObservationEncoder<E::Observation, B::InnerBackend>,
Act: DiscreteActionMapper<E::Action>,
B: AutodiffBackend,
Buf: ReplayBuffer<E::Observation, E::Action>,
impl<E, Enc, Act, B, Buf> DqnTrainer<E, Enc, Act, B, Buf>where
E: Environment,
E::Observation: Clone + Send + Sync + 'static,
E::Action: Clone + Send + Sync + 'static,
Enc: ObservationEncoder<E::Observation, B> + ObservationEncoder<E::Observation, B::InnerBackend>,
Act: DiscreteActionMapper<E::Action>,
B: AutodiffBackend,
Buf: ReplayBuffer<E::Observation, E::Action>,
pub fn new(env: E, agent: DqnAgent<E, Enc, Act, B, Buf>, seed: u64) -> Self
Sourcepub fn with_run(self, run: TrainingRun) -> Self
pub fn with_run(self, run: TrainingRun) -> Self
Attach a TrainingRun for checkpoint saving and stats persistence.
Sourcepub fn with_checkpoint_freq(self, freq: usize) -> Self
pub fn with_checkpoint_freq(self, freq: usize) -> Self
How often (in steps) to save a numbered checkpoint. Default: 10_000.
Sourcepub fn with_keep_checkpoints(self, keep: usize) -> Self
pub fn with_keep_checkpoints(self, keep: usize) -> Self
How many numbered checkpoints to keep on disk. Default: 5.
Sourcepub fn with_stats(self, stats: StatsTracker) -> Self
pub fn with_stats(self, stats: StatsTracker) -> Self
Replace the default stats tracker with a custom one.
Sourcepub fn steps(&mut self) -> TrainIter<'_, E, Enc, Act, B, Buf> ⓘ
pub fn steps(&mut self) -> TrainIter<'_, E, Enc, Act, B, Buf> ⓘ
Returns an iterator that yields StepMetrics after each environment step.
The iterator is infinite — stop it with .take(n) or break.
Sourcepub fn agent(&self) -> &DqnAgent<E, Enc, Act, B, Buf>
pub fn agent(&self) -> &DqnAgent<E, Enc, Act, B, Buf>
Access the agent for evaluation or inspection.
Sourcepub fn into_agent(self) -> DqnAgent<E, Enc, Act, B, Buf>
pub fn into_agent(self) -> DqnAgent<E, Enc, Act, B, Buf>
Consume the trainer and return the inner agent.
Useful for converting to a DqnPolicy after training:
let policy = trainer.into_agent().into_policy();Sourcepub fn train(&mut self, n_steps: usize)
pub fn train(&mut self, n_steps: usize)
Run n_steps of training.
If a TrainingRun is attached, saves checkpoints at checkpoint_freq
intervals and writes episode records to train_episodes.jsonl.
Sourcepub fn eval(&mut self, n_episodes: usize) -> EvalReport
pub fn eval(&mut self, n_episodes: usize) -> EvalReport
Run n_episodes of greedy evaluation and return an EvalReport.
Exploration is disabled (ε = 0). If a TrainingRun is attached,
each episode record is written to eval_episodes.jsonl.
If the mean reward improves, saves a best.mpk checkpoint.
Auto Trait Implementations§
impl<E, Enc, Act, B, Buf> Freeze for DqnTrainer<E, Enc, Act, B, Buf>where
E: Freeze,
Buf: Freeze,
Enc: Freeze,
Act: Freeze,
<B as Backend>::Device: Freeze,
<E as Environment>::Observation: Freeze,
impl<E, Enc, Act, B, Buf = CircularBuffer<<E as Environment>::Observation, <E as Environment>::Action>> !RefUnwindSafe for DqnTrainer<E, Enc, Act, B, Buf>
impl<E, Enc, Act, B, Buf> Send for DqnTrainer<E, Enc, Act, B, Buf>
impl<E, Enc, Act, B, Buf = CircularBuffer<<E as Environment>::Observation, <E as Environment>::Action>> !Sync for DqnTrainer<E, Enc, Act, B, Buf>
impl<E, Enc, Act, B, Buf> Unpin for DqnTrainer<E, Enc, Act, B, Buf>where
E: Unpin,
Buf: Unpin,
Enc: Unpin,
Act: Unpin,
<B as Backend>::Device: Unpin,
<E as Environment>::Observation: Unpin,
<B as Backend>::FloatTensorPrimitive: Unpin,
<B as Backend>::QuantizedTensorPrimitive: Unpin,
<<B as AutodiffBackend>::InnerBackend as Backend>::FloatTensorPrimitive: Unpin,
<<B as AutodiffBackend>::InnerBackend as Backend>::QuantizedTensorPrimitive: Unpin,
impl<E, Enc, Act, B, Buf> UnsafeUnpin for DqnTrainer<E, Enc, Act, B, Buf>where
E: UnsafeUnpin,
Buf: UnsafeUnpin,
Enc: UnsafeUnpin,
Act: UnsafeUnpin,
<B as Backend>::Device: UnsafeUnpin,
<E as Environment>::Observation: UnsafeUnpin,
impl<E, Enc, Act, B, Buf = CircularBuffer<<E as Environment>::Observation, <E as Environment>::Action>> !UnwindSafe for DqnTrainer<E, Enc, Act, B, Buf>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more