border 0.0.3

Reinforcement learning library
Documentation

Border

Border is a reinforcement learning library in Rust.

Status

Border is currently under development.

Prerequisites

In order to run examples, install python>=3.7 and gym, for which the library provides a wrapper using PyO3.

Examples

Random policy on cartople environment

The following command runs a random controller (policy) for 5 episodes in CartPole-v0:

$ cargo run --example random_cartpole

It renders during the episodes and generates a csv file in examples/model, including the sequences of observation and reward values in the episodes.

$ head -n3 examples/model/random_cartpole_eval.csv
0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023

Deep Q-network (DQN) on cartpole environment

The following command trains a DQN agent:

$ cargo run --example dqn_cartpole

After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole.

Soft actor-critic (SAC) on pendulum environment

The following command trains a SAC agent on Pendulum-v0, which takes continuous action:

$ cargo run --example sac_pendulum

The code defines an action filter that doubles the torque in the environment.

Atari games

The following command trains a DQN agent on PongNoFrameskip-v4:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4

During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g., SeaquestNoFrameskip-v4) by replacing the name of the environment in the above command.

For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive

The pretrained agent will be saved locally in $HOME/.border/model.

Vectorized environment for atari games

(The code might be broken due to recent changes. It will be fixed in future. The below description is for an older version)

The following command trains a DQN agent in an vectorized environment of Pong:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv

The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a g3s.xlarge instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.

After the training, you can see how the agent plays:

$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval

Features

Roadmap

License

Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).