border 0.0.0

Reinforcement learning library
# Border

Border is a reinforcement learning library in Rust.

## Status

Border is currently under development.

## Prerequisites

In order to run examples, install python>=3.7 and [gym](https://gym.openai.com). Gym is the only built-in environment. The library itself works with any kind of environment.

## Examples

* Random policy: the following command runs a random controller (policy) for 5 episodes in [CartPole-v0]https://gym.openai.com/envs/CartPole-v0/:

  ```bash
  $ cargo run --example random_cartpole
  ```

  It renders during the episodes and generates a csv file in `examples/model`, including the sequences of observation and reward values in the episodes.

  ```bash
  $ head -n3 examples/model/random_cartpole_eval.csv
  0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
  0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
  0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
  ```

* DQN agent: the following command trains a DQN agent:

  ```bash
  $ RUST_LOG=info cargo run --example dqn_cartpole
  ```

  After training, the trained agent runs for 5 episodes. In the code, the parameters of the trained Q-network (and the target network) are saved in `examples/model/dqn_cartpole` and load them for testing saving/loading trained models.

* SAC agent: the following command trains a SAC agent on [Pendulum-v0]https://gym.openai.com/envs/Pendulum-v0/, which takes continuous action:

  ```bash
  $ RUST_LOG=info cargo run --example sac_pendulum
  ```

  The code defines an action filter that doubles the torque in the environment.

* Pong: the following command trains a DQN agent on [PongNoFrameskip-v4]https://gym.openai.com/envs/Pong-v0/:

  ```bash
  $ PYTHONPATH=$REPO/examples RUST_LOG=info cargo run --example dqn_pong_vecenv
  ```

  This demonstrates how to use vectorized environments, in which 4 environments are running synchronously (see code). It took about 11 hours for 2M steps on a `g3s.xlarge` instance on EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book [Deep Reinforcement Learning Hands-On]https://www.packtpub.com/product/deep-reinforcement-learning-hands-on-second-edition/9781838826994. The learning curve is shown below.

  <img src="learning_curve.png" width="256">

  After the training, you can see how the agent plays:

  ```bash
  $ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
  ```

## Features

* Environments which wrap [gym] using [PyO3]https://crates.io/crates/pyo3 and [ndarray]https://crates.io/crates/ndarray
* Interfaces to record quantities in training process or in evaluation path
  * Support tensorboard using [tensorboard-rs]https://crates.io/crates/tensorboard-rs
* Vectorized environment using a tweaked [atari_wrapper.py]https://github.com/taku-y/border/blob/main/examples/atari_wrappers.py, adapted from the RL example in [tch]https://crates.io/crates/tch
* Agents based on [tch]https://crates.io/crates/tch
  * Currently including [DQN]https://arxiv.org/abs/1312.5602, [DDPG]https://arxiv.org/abs/1509.02971, [SAC]https://arxiv.org/abs/1801.01290

## Roadmap

* More tests and documentations
* Investigate a performance issue (https://github.com/taku-y/border/issues/5)
* More environments
  * [pybullet-gym]https://github.com/benelot/pybullet-gym, [rogue-gym]https://github.com/kngwyu/rogue-gym, [ViZDoom]https://github.com/mwydmuch/ViZDoom, [gym-minecraft]https://github.com/tambetm/gym-minecraft
* More RL algorithms
  * [DDQN]https://arxiv.org/abs/1509.06461, [Dueling network]https://arxiv.org/abs/1511.06581, [PPO]https://arxiv.org/abs/1707.06347, [QRDQN]https://arxiv.org/abs/1710.10044, [IQN]https://arxiv.org/abs/1806.06923, [TD3]https://arxiv.org/abs/1802.09477
  * [Prioritized experience replay]https://arxiv.org/abs/1511.05952, [parameter noise]https://arxiv.org/abs/1706.01905

## Licence

Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).