Expand description
Border is a reinforcement learning library.
This crate is a collection of examples using the crates below.
border-core
provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.border-py-gym-env
is a wrapper of the Gym environments written in Python, with the support of pybullet-gym and atari.border-atari-env
is a wrapper of atari-env, which is a part of gym-rs.border-tch-agent
is a collection of RL agents based on tch. Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC) are includes.border-async-trainer
defines some traits and functions for asynchronous training of RL agents by multiple actors, each of which runs a sampling process of an agent and an environment in parallel.
You can use a part of these crates for your purposes.
Environment
border-core
abstracts environments as Env
.
Env
has associated types Env::Obs
and Env::Act
for observation and action of
the envirnoment. Env::Config
should be configrations of the concrete type.
Policy and agent
In this crate, Policy
is a controller for an environment implementing Env
trait.
Agent
trait abstracts a trainable Policy
and has methods for save/load of
parameters, and its training.
Evaluation
Structs that implements Evaluator
trait can be used to run episodes with a given Env
and Policy
.
The code might look like below. Here we use DefaultEvaluator
, a built-in implementation
of Evaluator
.
type E = TYPE_OF_ENV;
type P = TYPE_OF_POLICY;
fn eval(model_dir: &str, render: bool) -> Result<()> {
let env_config: E::Config = {
let mut env_config = env_config()
.render_mode(Some("human".to_string()))
.set_wait_in_millis(10);
env_config
};
let mut agent: P = {
let mut agent = create_agent();
agent.load(model_dir)?;
agent.eval();
agent
};
let _ = DefaultEvaluator::new(&env_config, 0, 5)?.evaluate(&mut agent);
}
Users can customize the way the policy is evaluated by implementing a custom Evaluator
.
Training
You can train RL Agent
s by using Trainer
struct.
fn train(max_opts: usize, model_dir: &str) -> Result<()> {
let mut trainer = {
let env_config = env_config(); // configration of the environment
let step_proc_config = SimpleStepProcessorConfig {};
let replay_buffer_config =
SimpleReplayBufferConfig::default().capacity(REPLAY_BUFFER_CAPACITY);
let config = TrainerConfig::default()
.max_opts(max_opts);
// followed by methods to set training parameters
trainer = Trainer::<Env, StepProc, ReplayBuffer>::build(
config,
env_config,
step_proc_config,
replay_buffer_config,
)
};
let mut agent = create_agent();
let mut recorder = TensorboardRecorder::new(model_dir);
let mut evaluator = create_evaluator(&env_config())?;
trainer.train(&mut agent, &mut recorder, &mut evaluator)?;
Ok(())
}
In the above code, SimpleStepProcessorConfig
is configurations of
SimpleStepProcessor
, which implements StepProcessorBase
trait.
StepProcessorBase
abstracts the way how Step
object is processed before pushed to
a replay buffer. Users can customize implementation of StepProcessorBase
for their
purpose. For example, n-step TD samples or samples having Monte Carlo returns after the end
of episode can be computed with a statefull implementation of StepProcessorBase
.
It should be noted that a replay buffer is not a part of Agent
, but owned by
Trainer
. In the above code, the configuration of a replay buffer is given to
Trainer
. The design choise allows Agent
s to separate sampling and optimization
processes.
Modules
- Utilities