border 0.0.0

Reinforcement learning library
docs.rs failed to build border-0.0.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: border-0.0.6

Border

Border is a reinforcement learning library in Rust.

Status

Border is currently under development.

Prerequisites

In order to run examples, install python>=3.7 and gym. Gym is the only built-in environment. The library itself works with any kind of environment.

Examples

  • Random policy: the following command runs a random controller (policy) for 5 episodes in CartPole-v0:

    $ cargo run --example random_cartpole
    

    It renders during the episodes and generates a csv file in examples/model, including the sequences of observation and reward values in the episodes.

    $ head -n3 examples/model/random_cartpole_eval.csv
    0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
    0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
    0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
    
  • DQN agent: the following command trains a DQN agent:

    $ RUST_LOG=info cargo run --example dqn_cartpole
    

    After training, the trained agent runs for 5 episodes. In the code, the parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole and load them for testing saving/loading trained models.

  • SAC agent: the following command trains a SAC agent on Pendulum-v0, which takes continuous action:

    $ RUST_LOG=info cargo run --example sac_pendulum
    

    The code defines an action filter that doubles the torque in the environment.

  • Pong: the following command trains a DQN agent on PongNoFrameskip-v4:

    $ PYTHONPATH=$REPO/examples RUST_LOG=info cargo run --example dqn_pong_vecenv
    

    This demonstrates how to use vectorized environments, in which 4 environments are running synchronously (see code). It took about 11 hours for 2M steps on a g3s.xlarge instance on EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is shown below.

    After the training, you can see how the agent plays:

    $ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
    

Features

  • Environments which wrap gym using PyO3 and ndarray
  • Interfaces to record quantities in training process or in evaluation path
  • Vectorized environment using a tweaked atari_wrapper.py, adapted from the RL example in tch
  • Agents based on tch

Roadmap

Licence

Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).