Crate border

source ·
Expand description

Border is a reinforcement learning library.

This crate is a collection of examples using the crates below.

  • border-core provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.
  • border-py-gym-env is a wrapper of the Gym environments written in Python, with the support of pybullet-gym and atari.
  • border-atari-env is a wrapper of atari-env, which is a part of gym-rs.
  • border-tch-agent is a collection of RL agents based on tch. Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC) are includes.
  • border-async-trainer defines some traits and functions for asynchronous training of RL agents by multiple actors, each of which runs a sampling process of an agent and an environment in parallel.

You can use a part of these crates for your purposes.

Environment

border-core abstracts environments as Env. Env has associated types Env::Obs and Env::Act for observation and action of the envirnoment. Env::Config should be configrations of the concrete type.

Policy and agent

In this crate, Policy is a controller for an environment implementing Env trait. Agent trait abstracts a trainable Policy and has methods for save/load of parameters, and its training.

Evaluation

Structs that implements Evaluator trait can be used to run episodes with a given Env and Policy. The code might look like below. Here we use DefaultEvaluator, a built-in implementation of Evaluator.

type E = TYPE_OF_ENV;
type P = TYPE_OF_POLICY;

fn eval(model_dir: &str, render: bool) -> Result<()> {
    let env_config: E::Config = {
        let mut env_config = env_config()
            .render_mode(Some("human".to_string()))
            .set_wait_in_millis(10);
        env_config
    };
    let mut agent: P = {
        let mut agent = create_agent();
        agent.load(model_dir)?;
        agent.eval();
        agent
    };

    let _ = DefaultEvaluator::new(&env_config, 0, 5)?.evaluate(&mut agent);
}

Users can customize the way the policy is evaluated by implementing a custom Evaluator.

Training

You can train RL Agents by using Trainer struct.

fn train(max_opts: usize, model_dir: &str) -> Result<()> {
    let mut trainer = {
        let env_config = env_config(); // configration of the environment
        let step_proc_config = SimpleStepProcessorConfig {};
        let replay_buffer_config =
            SimpleReplayBufferConfig::default().capacity(REPLAY_BUFFER_CAPACITY);
        let config = TrainerConfig::default()
            .max_opts(max_opts);
            // followed by methods to set training parameters

        trainer = Trainer::<Env, StepProc, ReplayBuffer>::build(
            config,
            env_config,
            step_proc_config,
            replay_buffer_config,
        )
    };
    let mut agent = create_agent();
    let mut recorder = TensorboardRecorder::new(model_dir);
    let mut evaluator = create_evaluator(&env_config())?;

    trainer.train(&mut agent, &mut recorder, &mut evaluator)?;

    Ok(())
}

In the above code, SimpleStepProcessorConfig is configurations of SimpleStepProcessor, which implements StepProcessorBase trait. StepProcessorBase abstracts the way how Step object is processed before pushed to a replay buffer. Users can customize implementation of StepProcessorBase for their purpose. For example, n-step TD samples or samples having Monte Carlo returns after the end of episode can be computed with a statefull implementation of StepProcessorBase.

It should be noted that a replay buffer is not a part of Agent, but owned by Trainer. In the above code, the configuration of a replay buffer is given to Trainer. The design choise allows Agents to separate sampling and optimization processes.

Modules