Expand description
Asynchronous trainer with parallel sampling processes.
The code might look like below.
type Env = TestEnv;
type ObsBatch = TestObsBatch;
type ActBatch = TestActBatch;
type ReplayBuffer = SimpleReplayBuffer<ObsBatch, ActBatch>;
type StepProcessor = SimpleStepProcessor<Env, ObsBatch, ActBatch>;
// Create a new agent by wrapping the existing agent in order to implement SyncModel.
struct TestAgent2(TestAgent);
impl border_core::Configurable for TestAgent2 {
type Config = TestAgentConfig;
fn build(config: Self::Config) -> Self {
Self(TestAgent::build(config))
}
}
impl border_core::Agent<Env, ReplayBuffer> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
fn train(&mut self) {
self.0.train();
}
// For other methods ...
}
impl border_core::Policy<Env> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
// ...
}
impl border_async_trainer::SyncModel for TestAgent2{
// Self::ModelInfo shold include the model parameters.
type ModelInfo = usize;
fn model_info(&self) -> (usize, Self::ModelInfo) {
// Extracts the model parameters and returns them as Self::ModelInfo.
// The first element of the tuple is the number of optimization steps.
(0, 0)
}
fn sync_model(&mut self, _model_info: &Self::ModelInfo) {
// implements synchronization of the model based on the _model_info
}
}
let agent_configs: Vec<_> = vec![agent_config()];
let env_config_train = env_config();
let env_config_eval = env_config();
let replay_buffer_config = SimpleReplayBufferConfig::default();
let step_proc_config = SimpleStepProcessorConfig::default();
let actor_man_config = ActorManagerConfig::default();
let async_trainer_config = AsyncTrainerConfig::default();
let mut recorder: Box<dyn Recorder<_, _>> = Box::new(NullRecorder::new());
let mut evaluator = DefaultEvaluator::<TestEnv>::new(&env_config_eval, 0, 1).unwrap();
border_async_trainer::util::train_async::<TestAgent2, _, _, StepProcessor>(
&agent_config(),
&agent_configs,
&env_config_train,
&env_config_eval,
&step_proc_config,
&replay_buffer_config,
&actor_man_config,
&async_trainer_config,
&mut recorder,
&mut evaluator,
);
Training process consists of the following two components:
ActorManager
managesActor
s, each of which runs a thread for interactingAgent
andEnv
and taking samples. Those samples will be sent to the replay buffer inAsyncTrainer
.AsyncTrainer
is responsible for training of an agent. It also runs a thread for pushing samples fromActorManager
into a replay buffer.
The Agent
must implement SyncModel
trait in order to synchronize the model of
the agent in Actor
with the trained agent in AsyncTrainer
. The trait has
the ability to import and export the information of the model as
SyncModel
::ModelInfo
.
The Agent
in AsyncTrainer
is responsible for training, typically with a GPU,
while the Agent
s in Actor
s in ActorManager
is responsible for sampling
using CPU.
Both AsyncTrainer
and ActorManager
are running in the same machine and
communicate by channels.
Modules§
- util
- Utility function.
Structs§
- Actor
- Generate transitions by running
Agent
inEnv
. - Actor
Manager - Manages
Actor
s. - Actor
Manager Config - Configuration of
ActorManager
. - Actor
Stat - Stats of sampling process in an
Actor
. - Async
Train Stat - Stats of
AsyncTrainer
::train()
. - Async
Trainer - Manages asynchronous training loop in a single machine.
- Async
Trainer Config - Configuration of
AsyncTrainer
. - Pushed
Item Message - Message containing a
ReplayBufferBase
::Item
. - Replay
Buffer Proxy - A wrapper of replay buffer for asynchronous trainer.
- Replay
Buffer Proxy Config - Configuration of
ReplayBufferProxy
.
Enums§
Traits§
- Sync
Model - Synchronizes the model of the agent in asynchronous training.
Functions§
- actor_
stats_ fmt - Returns a formatted string of the set of
ActorStat
for reporting.