Expand description
Asynchronous trainer with parallel sampling processes.
The code might look like below.
type Env = TestEnv;
type ObsBatch = TestObsBatch;
type ActBatch = TestActBatch;
type ReplayBuffer = SimpleReplayBuffer<ObsBatch, ActBatch>;
type StepProcessor = SimpleStepProcessor<Env, ObsBatch, ActBatch>;
// Create a new agent by wrapping the existing agent in order to implement SyncModel.
struct TestAgent2(TestAgent);
impl border_core::Configurable for TestAgent2 {
type Config = TestAgentConfig;
fn build(config: Self::Config) -> Self {
Self(TestAgent::build(config))
}
}
impl border_core::Agent<Env, ReplayBuffer> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
fn train(&mut self) {
self.0.train();
}
// For other methods ...
}
impl border_core::Policy<Env> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
// ...
}
impl border_async_trainer::SyncModel for TestAgent2{
// Self::ModelInfo shold include the model parameters.
type ModelInfo = usize;
fn model_info(&self) -> (usize, Self::ModelInfo) {
// Extracts the model parameters and returns them as Self::ModelInfo.
// The first element of the tuple is the number of optimization steps.
(0, 0)
}
fn sync_model(&mut self, _model_info: &Self::ModelInfo) {
// implements synchronization of the model based on the _model_info
}
}
let agent_configs: Vec<_> = vec![agent_config()];
let env_config_train = env_config();
let env_config_eval = env_config();
let replay_buffer_config = SimpleReplayBufferConfig::default();
let step_proc_config = SimpleStepProcessorConfig::default();
let actor_man_config = ActorManagerConfig::default();
let async_trainer_config = AsyncTrainerConfig::default();
let mut recorder: Box<dyn Recorder<_, _>> = Box::new(NullRecorder::new());
let mut evaluator = DefaultEvaluator::<TestEnv>::new(&env_config_eval, 0, 1).unwrap();
border_async_trainer::util::train_async::<TestAgent2, _, _, StepProcessor>(
&agent_config(),
&agent_configs,
&env_config_train,
&env_config_eval,
&step_proc_config,
&replay_buffer_config,
&actor_man_config,
&async_trainer_config,
&mut recorder,
&mut evaluator,
);Training process consists of the following two components:
ActorManagermanagesActors, each of which runs a thread for interactingAgentandEnvand taking samples. Those samples will be sent to the replay buffer inAsyncTrainer.AsyncTraineris responsible for training of an agent. It also runs a thread for pushing samples fromActorManagerinto a replay buffer.
The Agent must implement SyncModel trait in order to synchronize the model of
the agent in Actor with the trained agent in AsyncTrainer. The trait has
the ability to import and export the information of the model as
SyncModel::ModelInfo.
The Agent in AsyncTrainer is responsible for training, typically with a GPU,
while the Agents in Actors in ActorManager is responsible for sampling
using CPU.
Both AsyncTrainer and ActorManager are running in the same machine and
communicate by channels.
Modules§
- util
- Utility function.
Structs§
- Actor
- Generate transitions by running
AgentinEnv. - Actor
Manager - Manages
Actors. - Actor
Manager Config - Configuration of
ActorManager. - Actor
Stat - Stats of sampling process in an
Actor. - Async
Train Stat - Stats of
AsyncTrainer::train(). - Async
Trainer - Manages asynchronous training loop in a single machine.
- Async
Trainer Config - Configuration of
AsyncTrainer. - Pushed
Item Message - Message containing a
ReplayBufferBase::Item. - Replay
Buffer Proxy - A wrapper of replay buffer for asynchronous trainer.
- Replay
Buffer Proxy Config - Configuration of
ReplayBufferProxy.
Enums§
Traits§
- Sync
Model - Synchronizes the model of the agent in asynchronous training.
Functions§
- actor_
stats_ fmt - Returns a formatted string of the set of
ActorStatfor reporting.