border 0.0.0 - Docs.rs

# Border

Border is a reinforcement learning library in Rust.

## Status

Border is currently under development.

## Prerequisites

In order to run examples, install python>=3.7 and [gym](https://gym.openai.com). Gym is the only built-in environment. The library itself works with any kind of environment.

## Examples

* Random policy: the following command runs a random controller (policy) for 5 episodes in [CartPole-v0](https://gym.openai.com/envs/CartPole-v0/):

  ```bash
  $ cargo run --example random_cartpole
  ```

  It renders during the episodes and generates a csv file in `examples/model`, including the sequences of observation and reward values in the episodes.

  ```bash
  $ head -n3 examples/model/random_cartpole_eval.csv
  0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
  0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
  0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
  ```

* DQN agent: the following command trains a DQN agent:

  ```bash
  $ RUST_LOG=info cargo run --example dqn_cartpole
  ```

  After training, the trained agent runs for 5 episodes. In the code, the parameters of the trained Q-network (and the target network) are saved in `examples/model/dqn_cartpole` and load them for testing saving/loading trained models.

* SAC agent: the following command trains a SAC agent on [Pendulum-v0](https://gym.openai.com/envs/Pendulum-v0/), which takes continuous action:

  ```bash
  $ RUST_LOG=info cargo run --example sac_pendulum
  ```

  The code defines an action filter that doubles the torque in the environment.

* Pong: the following command trains a DQN agent on [PongNoFrameskip-v4](https://gym.openai.com/envs/Pong-v0/):

  ```bash
  $ PYTHONPATH=$REPO/examples RUST_LOG=info cargo run --example dqn_pong_vecenv
  ```

  This demonstrates how to use vectorized environments, in which 4 environments are running synchronously (see code). It took about 11 hours for 2M steps on a `g3s.xlarge` instance on EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book [Deep Reinforcement Learning Hands-On](https://www.packtpub.com/product/deep-reinforcement-learning-hands-on-second-edition/9781838826994). The learning curve is shown below.

  <img src="learning_curve.png" width="256">

  After the training, you can see how the agent plays:

  ```bash
  $ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
  ```

## Features

* Environments which wrap [gym]() using [PyO3](https://crates.io/crates/pyo3) and [ndarray](https://crates.io/crates/ndarray)
* Interfaces to record quantities in training process or in evaluation path
  * Support tensorboard using [tensorboard-rs](https://crates.io/crates/tensorboard-rs)
* Vectorized environment using a tweaked [atari_wrapper.py](https://github.com/taku-y/border/blob/main/examples/atari_wrappers.py), adapted from the RL example in [tch](https://crates.io/crates/tch)
* Agents based on [tch](https://crates.io/crates/tch)
  * Currently including [DQN](https://arxiv.org/abs/1312.5602), [DDPG](https://arxiv.org/abs/1509.02971), [SAC](https://arxiv.org/abs/1801.01290)

## Roadmap

* More tests and documentations
* Investigate a performance issue (https://github.com/taku-y/border/issues/5)
* More environments
  * [pybullet-gym](https://github.com/benelot/pybullet-gym), [rogue-gym](https://github.com/kngwyu/rogue-gym), [ViZDoom](https://github.com/mwydmuch/ViZDoom), [gym-minecraft](https://github.com/tambetm/gym-minecraft)
* More RL algorithms
  * [DDQN](https://arxiv.org/abs/1509.06461), [Dueling network](https://arxiv.org/abs/1511.06581), [PPO](https://arxiv.org/abs/1707.06347), [QRDQN](https://arxiv.org/abs/1710.10044), [IQN](https://arxiv.org/abs/1806.06923), [TD3](https://arxiv.org/abs/1802.09477)
  * [Prioritized experience replay](https://arxiv.org/abs/1511.05952), [parameter noise](https://arxiv.org/abs/1706.01905)

## Licence

Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).