Forger - Reinforcement Learning Library in Rust
Introduction
Forger is a Reinforcement Learning (RL) library in Rust, offering a robust and efficient framework for implementing RL algorithms. It features a modular design with components for agents, environments, policies, and utilities, facilitating easy experimentation and development of RL models.
Features
- Modular Components: Includes agents, environments, and policies as separate modules.
- Efficient and Safe: Built in Rust, ensuring high performance and safety.
- Customizable Environments: Provides a framework to create and manage different RL environments.
- Flexible Agent Implementations: Supports various agent strategies and learning algorithms.
- Extensible Policy Framework: Allows for the implementation of diverse action selection policies.
Modules
-
Policy (
policy):- Defines the interface for action selection policies.
- Includes an implementation of Epsilon Greedy (with Decay) Policy.
-
Agent (
agent):- Outlines the structure for RL agents.
- Implements Value Iteration - Every Visit Monte Carlo (
VEveryVisitMC) and Q-Learning - Every Visit Monte Carlo (QEveryVisitMC).
-
Environment (
env):- Provides the
Envtrait to define RL environments. - Contains
LineWorld, a simple linear world environment for experimentation.
- Provides the
-
Prelude (
prelude):- Exports commonly used items from the
env,agent, andpolicymodules for convenient access.
- Exports commonly used items from the
Getting Started
Prerequisites
- Rust Programming Environment
Installation
In your project directory, run the following command:
cargo add forger
Basic Usage
use *;
use ;
pub type S = usize; // State
pub type A = LineWorldAction; // Action
pub type P = ; // Policy
pub type E = LineWorld; // Environment
Examples
-
Monte Carlo with Epsilon Decay in
LineWorld:- Demonstrates the use of the Q-Learning Every Visit Monte Carlo (
QEveryVisitMC) agent with an Epsilon Greedy Policy (with decay) in theLineWorldenvironment. - Illustrates the process of running multiple episodes, selecting actions, updating the agent, and decaying the epsilon value over time.
- Updates the agent after each episode.
- Demonstrates the use of the Q-Learning Every Visit Monte Carlo (
-
TD0 with Epsilon Decay in
GridWorld:- Demonstrates the use of the TD0 (
TD0) agent with an Epsilon Greedy Policy (with decay) in theGridWorldenvironment. - Illustrates the process of running multiple episodes, selecting actions, updating the agent, and decaying the epsilon value over time.
- Updates the agent every steps in each episode.
- Include test process of trained agent.
- Demonstrates the use of the TD0 (
Contributing
Contributions to Forger are welcome! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
License
Forger is licensed under the MIT License or the Apache 2.0 License.