reinforcex 0.0.5

Deep Reinforcement Learning Framework
docs.rs failed to build reinforcex-0.0.5
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

About ReinforceX

ReinforceX (ReX) is an early-stage deep reinforcement learning framework built in Rust. It is designed as a Rust-first playground for implementing, experimenting with, and eventually productionizing reinforcement learning agents without making Python the core runtime.

The project currently focuses on:

  • a small, readable core for value-based, policy-based, and actor-critic algorithms;
  • neural-network policies and Q-functions backed by tch / libtorch;
  • replay and on-policy buffers that can be shared across training workers;
  • sample Gymnasium environments exposed through a simple HTTP server;
  • an optional C ABI for embedding agents from C, C++, C#, Unity, or other runtimes.

Advantages of Rust for this project:

  • ownership and RAII make long-running training jobs easier to reason about;
  • Send / Sync boundaries make parallel training explicit;
  • native binaries are a good fit for simulators, games, robotics, and embedded integrations;
  • Rust can still use libtorch through tch, so the project can combine systems programming ergonomics with modern tensor operations.

ReinforceX is not yet a stable 1.0 API. Contributions are welcome, especially around algorithms, documentation, benchmark environments, test coverage, and safe public API design.

Package

crates.io: https://crates.io/crates/reinforcex

cargo add reinforcex

The default cpu feature enables torch-sys with download-libtorch.

[dependencies]
reinforcex = "0.0.4"

For CUDA experiments, build with the cuda feature and make sure your local libtorch / CUDA runtime is visible to tch. On Windows, load_cuda_dlls() also checks TORCH_CUDA_DLL when the cuda feature is enabled.

Algorithms

Implemented agents:

  • DQN: Double-DQN style target network, n-step replay, epsilon-greedy exploration, optional reward-based selector, shared replay buffer support.
  • PPO: clipped policy objective, GAE, value clipping, entropy regularization, discrete, multi-branch discrete, and Gaussian policies.
  • SAC: continuous and discrete Soft Actor-Critic, twin critics, soft target updates, automatic temperature updates for discrete policies, and component checkpointing.

Core building blocks:

  • Models: FCQNetwork, FCSoftmaxPolicy, FCSoftmaxPolicyWithValue, FCGaussianPolicy, FCGaussianPolicyWithValue.
  • Distributions: SoftmaxDistribution, MultiSoftmaxDistribution, GaussianDistribution.
  • Memory: ReplayBuffer with n-step transitions, OnPolicyBuffer.
  • Exploration and selection: EpsilonGreedy, RewardBasedSelector.
  • FFI: DQN and PPO can be created and trained through a C-compatible API.

API

Instantiate a DQN agent.

use reinforcex::agents::{BaseAgent, DQN};
use reinforcex::explorers::EpsilonGreedy;
use reinforcex::memory::ReplayBuffer;
use reinforcex::models::FCQNetwork;
use std::sync::Arc;
use tch::{nn, nn::OptimizerConfig, Device};

let device = Device::cuda_if_available();
let vs = nn::VarStore::new(device);
let optimizer = nn::Adam::default().build(&vs, 3e-4).unwrap();

let n_input_channels = 4;
let action_size = 2;
let n_hidden_layers = 2;
let n_hidden_channels = 128;

let model = Box::new(FCQNetwork::new(
    vs,
    n_input_channels,
    action_size,
    n_hidden_layers,
    n_hidden_channels,
));

let gamma = 0.97;
let n_steps = 3;
let batch_size = 16;
let update_interval = 8;
let target_update_interval = 100;
let replay_buffer_capacity = 2_000;

let explorer = EpsilonGreedy::new(0.5, 0.1, 50_000);
let transition_buffer = Arc::new(ReplayBuffer::new(replay_buffer_capacity, n_steps));

let mut agent = DQN::new(
    model,
    transition_buffer,
    optimizer,
    action_size as usize,
    batch_size,
    update_interval,
    target_update_interval,
    Box::new(explorer),
    None,
    gamma,
    Some("models/dqn_latest.ot".to_string()),
    None,
);

Common agent methods are provided by BaseAgent.

fn act(&self, obs: &Tensor) -> Tensor;
fn act_and_train(&mut self, obs: &Tensor, reward: f64) -> Tensor;
fn stop_episode_and_train(&mut self, obs: &Tensor, reward: f64);
fn get_statistics(&self) -> Vec<(String, f64)>;
fn save(&self);
fn load(&mut self);

Pseudo code for training:

for episode in 0..max_episode {
    let mut reward = 0.0;

    for step in 0..max_step {
        let action = agent.act_and_train(&obs, reward);
        let (next_obs, next_reward, done) = env.step(action);

        obs = next_obs;
        reward = next_reward;

        if done {
            agent.stop_episode_and_train(&obs, reward);
            break;
        }
    }
}

Pseudo code for parallel learning:

use rayon::prelude::*;
use std::sync::Arc;

let buffer = Arc::new(ReplayBuffer::new(1_000, 1));

(0..n_threads).into_par_iter().for_each(|agent_id| {
    let (model, optimizer, explorer) = build_agent_components();

    let mut agent = DQN::new(
        model,
        Arc::clone(&buffer),
        optimizer,
        action_size,
        batch_size,
        update_interval,
        target_update_interval,
        Box::new(explorer),
        None,
        gamma,
        Some(format!("models/dqn_{agent_id}.ot")),
        None,
    );

    for episode in 0..max_episode {
        // Run the same training loop as above.
    }
});

build_agent_components() is a placeholder for creating a separate model, optimizer, and explorer per worker. Share only the replay buffer or other explicitly thread-safe state.

Sample experiments

The sample experiments call Gymnasium environments through FastAPI servers. Docker Compose starts ten environment servers on ports 8001 to 8010.

docker compose -f sample_env/docker-compose.yml up -d --build

Run CartPole with DQN:

cargo run -p reinforcex --features cpu -- --env cartpole --algo dqn

Run CartPole with PPO:

cargo run -p reinforcex --features cpu -- --env cartpole --algo ppo

Run CartPole with discrete SAC using four parallel environment servers:

cargo run -p reinforcex --features cpu -- --env cartpole --algo sac --parallel 4

Run LunarLanderContinuous with continuous SAC:

cargo run -p reinforcex --features cpu -- --env lunar --algo sac --parallel 4

Run Ant with PPO:

cargo run -p reinforcex --features cpu -- --env ant --algo ppo

Use --save-path and --load-path to persist models. Multi-agent samples can include {agent_id} in the path.

cargo run -p reinforcex --features cpu -- \
  --env cartpole \
  --algo dqn \
  --save-path "models/cartpole_dqn_{agent_id}.ot" \
  --load-path "models/cartpole_dqn_{agent_id}.ot"

For SAC, a single save path expands into component checkpoints such as actor, critic1, critic2, and temperature files.

Stop the sample environment servers:

docker compose -f sample_env/docker-compose.yml down

Unit test

Run all Rust unit tests from the workspace root:

cargo test --workspace

The core unit tests exercise agents, models, probability distributions, memory buffers, selectors, and the FFI wrapper. The Docker-based Gymnasium server is only required for the sample experiments above.

FFI

ReinforceX also provides a small Foreign Function Interface (FFI) crate for embedding agents from external runtimes such as C, C++, C#, or Unity.

Build the dynamic library:

cargo build -p reinforcex_ffi --release

The generated library is named reinforcex with the platform-specific dynamic library extension, for example reinforcex.dll, libreinforcex.so, or libreinforcex.dylib.

Overview

  • All agents are managed internally and referenced through a u64 ID.
  • The public FFI functions catch panics and return silently on invalid inputs.
  • All sizes use u64 for ABI-friendly boundaries.
  • The caller owns input and output buffer allocation.
  • agent_type = 0 creates DQN; any other value creates PPO.

Data Structures

AgentConfig

typedef struct {
    uint32_t agent_type;

    uint64_t obs_size;
    uint64_t action_size;
    double learning_rate;
    double gamma;

    uint64_t batch_size;
    uint64_t buffer_size;
    double epsilon_start;
    double epsilon_end;
    uint64_t epsilon_decay;

    double lambda;
    uint64_t update_interval;
    uint64_t epoch;
    uint64_t minibatch_size;
    double clip_eps;
} AgentConfig;
Field Description
agent_type 0 = DQN, otherwise PPO
obs_size Observation vector size
action_size Action space size
learning_rate Optimizer learning rate
gamma Discount factor
batch_size DQN batch size
buffer_size DQN replay buffer size
epsilon_start Initial epsilon for DQN
epsilon_end Final epsilon for DQN
epsilon_decay Epsilon decay steps for DQN
lambda PPO GAE lambda
update_interval PPO update interval
epoch PPO training epochs
minibatch_size PPO minibatch size
clip_eps PPO clipping epsilon

Functions

rx_agent_create

uint64_t rx_agent_create(const AgentConfig* config);

Creates a new agent and returns its ID. Returns 0 on failure.

rx_agent_act_and_train

void rx_agent_act_and_train(
    uint64_t id,
    const float* obs,
    uint64_t obs_len,
    float reward,
    float* out,
    uint64_t out_len
);

Performs action selection and one training step. DQN writes one scalar action. PPO writes a vector action and truncates to out_len if the output buffer is smaller than the action tensor.

rx_agent_stop_episode

void rx_agent_stop_episode(
    uint64_t id,
    const float* obs,
    uint64_t obs_len,
    float reward
);

Signals the end of an episode and performs the final training step.

rx_agent_destroy

void rx_agent_destroy(uint64_t id);

Destroys the agent for the given ID. Calling it with an unknown ID is a no-op.

Contributing

ReinforceX is a good place to contribute if you are interested in Rust, reinforcement learning, libtorch bindings, simulator integration, or FFI.

Useful contribution areas:

  • algorithm implementations and correctness tests;
  • benchmark scripts and reproducible training results;
  • safer public APIs around tensor shapes, device placement, and errors;
  • documentation for model construction and environment integration;
  • CI for Rust tests, formatting, and platform-specific FFI builds.

Before opening a pull request, please run:

cargo fmt --all -- --check
cargo test --workspace

License

MIT License (https://github.com/kakky-hacker/reinforcex/blob/master/LICENSE)