pub struct AsyncTrainer<A, E, R>where
A: Agent<E, R> + Configurable + SyncModel,
E: Env,
R: ExperienceBufferBase + ReplayBufferBase,
R::Item: Send + 'static,{ /* private fields */ }
Expand description
Manages asynchronous training loop in a single machine.
It interacts with ActorManager
as shown below:
flowchart LR
subgraph ActorManager
E[Actor]-->|ReplayBufferBase::PushedItem|H[ReplayBufferProxy]
F[Actor]-->H
G[Actor]-->H
end
K-->|SyncModel::ModelInfo|E
K-->|SyncModel::ModelInfo|F
K-->|SyncModel::ModelInfo|G
subgraph I[AsyncTrainer]
H-->|PushedItemMessage|J[ReplayBuffer]
J-->|ReplayBufferBase::Batch|K[Agent]
end
- The
Agent
inAsyncTrainer
(left) is trained with batches of typeReplayBufferBase::Batch
, which are taken from the replay buffer. - The model parameters of the
Agent
inAsyncTrainer
are wrapped inSyncModel::ModelInfo
and periodically sent to theAgent
s inActor
s.Agent
must implementSyncModel
to synchronize the model parameters. - In
ActorManager
(right),Actor
s sample transitions, which have typeReplayBufferBase::Item
, and push the transitions intoReplayBufferProxy
. ReplayBufferProxy
has a type parameter ofReplayBufferBase
and the proxy acceptsReplayBufferBase::Item
.- The proxy sends the transitions into the replay buffer in the
AsyncTrainer
.
Implementations§
Source§impl<A, E, R> AsyncTrainer<A, E, R>where
A: Agent<E, R> + Configurable + SyncModel + 'static,
E: Env,
R: ExperienceBufferBase + ReplayBufferBase,
R::Item: Send + 'static,
impl<A, E, R> AsyncTrainer<A, E, R>where
A: Agent<E, R> + Configurable + SyncModel + 'static,
E: Env,
R: ExperienceBufferBase + ReplayBufferBase,
R::Item: Send + 'static,
Sourcepub fn build(
config: &AsyncTrainerConfig,
agent_config: &A::Config,
env_config: &E::Config,
replay_buffer_config: &R::Config,
r_bulk_pushed_item: Receiver<PushedItemMessage<R::Item>>,
model_info_sender: Sender<(usize, A::ModelInfo)>,
stop: Arc<Mutex<bool>>,
) -> Self
pub fn build( config: &AsyncTrainerConfig, agent_config: &A::Config, env_config: &E::Config, replay_buffer_config: &R::Config, r_bulk_pushed_item: Receiver<PushedItemMessage<R::Item>>, model_info_sender: Sender<(usize, A::ModelInfo)>, stop: Arc<Mutex<bool>>, ) -> Self
Creates AsyncTrainer
.
Sourcepub fn train<D>(
&mut self,
recorder: &mut Box<dyn Recorder<E, R>>,
evaluator: &mut D,
guard_init_env: Arc<Mutex<bool>>,
) -> AsyncTrainStatwhere
D: Evaluator<E>,
pub fn train<D>(
&mut self,
recorder: &mut Box<dyn Recorder<E, R>>,
evaluator: &mut D,
guard_init_env: Arc<Mutex<bool>>,
) -> AsyncTrainStatwhere
D: Evaluator<E>,
Runs training loop.
In the training loop, the following values will be pushed into the given recorder:
samples_total
- Total number of samples pushed into the replay buffer. Here, a “sample” is an item inExperienceBufferBase::Item
.opt_steps_per_sec
- The number of optimization steps per second.samples_per_sec
- The number of samples per second.samples_per_opt_steps
- The number of samples per optimization step.
These values will typically be monitored with tensorboard.
Auto Trait Implementations§
impl<A, E, R> Freeze for AsyncTrainer<A, E, R>where
<E as Env>::Config: Freeze,
<R as ReplayBufferBase>::Config: Freeze,
<A as Configurable>::Config: Freeze,
impl<A, E, R> RefUnwindSafe for AsyncTrainer<A, E, R>where
<E as Env>::Config: RefUnwindSafe,
<R as ReplayBufferBase>::Config: RefUnwindSafe,
<A as Configurable>::Config: RefUnwindSafe,
A: RefUnwindSafe,
E: RefUnwindSafe,
R: RefUnwindSafe,
impl<A, E, R> Send for AsyncTrainer<A, E, R>
impl<A, E, R> Sync for AsyncTrainer<A, E, R>
impl<A, E, R> Unpin for AsyncTrainer<A, E, R>where
<E as Env>::Config: Unpin,
<R as ReplayBufferBase>::Config: Unpin,
<A as Configurable>::Config: Unpin,
A: Unpin,
E: Unpin,
R: Unpin,
<R as ExperienceBufferBase>::Item: Unpin,
impl<A, E, R> UnwindSafe for AsyncTrainer<A, E, R>where
<E as Env>::Config: UnwindSafe,
<R as ReplayBufferBase>::Config: UnwindSafe,
<A as Configurable>::Config: UnwindSafe,
A: UnwindSafe,
E: UnwindSafe,
R: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more