Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
About ReinforceX
ReinforceX (ReX) is an early-stage deep reinforcement learning framework built in Rust. It is designed as a Rust-first playground for implementing, experimenting with, and eventually productionizing reinforcement learning agents without making Python the core runtime.
The project currently focuses on:
- a small, readable core for value-based, policy-based, and actor-critic algorithms;
- neural-network policies and Q-functions backed by
tch/ libtorch; - replay and on-policy buffers that can be shared across training workers;
- sample Gymnasium environments exposed through a simple HTTP server;
- an optional C ABI for embedding agents from C, C++, C#, Unity, or other runtimes.
Advantages of Rust for this project:
- ownership and RAII make long-running training jobs easier to reason about;
Send/Syncboundaries make parallel training explicit;- native binaries are a good fit for simulators, games, robotics, and embedded integrations;
- Rust can still use libtorch through
tch, so the project can combine systems programming ergonomics with modern tensor operations.
ReinforceX is not yet a stable 1.0 API. Contributions are welcome, especially around algorithms, documentation, benchmark environments, test coverage, and safe public API design.
Package
crates.io: https://crates.io/crates/reinforcex
The default cpu feature enables torch-sys with download-libtorch.
[]
= "0.0.4"
For CUDA experiments, build with the cuda feature and make sure your local
libtorch / CUDA runtime is visible to tch. On Windows, load_cuda_dlls() also
checks TORCH_CUDA_DLL when the cuda feature is enabled.
Algorithms
Implemented agents:
- DQN: Double-DQN style target network, n-step replay, epsilon-greedy exploration, optional reward-based selector, shared replay buffer support.
- PPO: clipped policy objective, GAE, value clipping, entropy regularization, discrete, multi-branch discrete, and Gaussian policies.
- SAC: continuous and discrete Soft Actor-Critic, twin critics, soft target updates, automatic temperature updates for discrete policies, and component checkpointing.
Core building blocks:
- Models:
FCQNetwork,FCSoftmaxPolicy,FCSoftmaxPolicyWithValue,FCGaussianPolicy,FCGaussianPolicyWithValue. - Distributions:
SoftmaxDistribution,MultiSoftmaxDistribution,GaussianDistribution. - Memory:
ReplayBufferwith n-step transitions,OnPolicyBuffer. - Exploration and selection:
EpsilonGreedy,RewardBasedSelector. - FFI: DQN and PPO can be created and trained through a C-compatible API.
API
Instantiate a DQN agent.
use ;
use EpsilonGreedy;
use ReplayBuffer;
use FCQNetwork;
use Arc;
use ;
let device = cuda_if_available;
let vs = new;
let optimizer = default.build.unwrap;
let n_input_channels = 4;
let action_size = 2;
let n_hidden_layers = 2;
let n_hidden_channels = 128;
let model = Boxnew;
let gamma = 0.97;
let n_steps = 3;
let batch_size = 16;
let update_interval = 8;
let target_update_interval = 100;
let replay_buffer_capacity = 2_000;
let explorer = new;
let transition_buffer = new;
let mut agent = DQNnew;
Common agent methods are provided by BaseAgent.
;
;
;
;
;
;
Pseudo code for training:
for episode in 0..max_episode
Pseudo code for parallel learning:
use *;
use Arc;
let buffer = new;
.into_par_iter.for_each;
build_agent_components() is a placeholder for creating a separate model,
optimizer, and explorer per worker. Share only the replay buffer or other
explicitly thread-safe state.
Sample experiments
The sample experiments call Gymnasium environments through FastAPI servers.
Docker Compose starts ten environment servers on ports 8001 to 8010.
Run CartPole with DQN:
Run CartPole with PPO:
Run CartPole with discrete SAC using four parallel environment servers:
Run LunarLanderContinuous with continuous SAC:
Run Ant with PPO:
Use --save-path and --load-path to persist models. Multi-agent samples can
include {agent_id} in the path.
For SAC, a single save path expands into component checkpoints such as actor, critic1, critic2, and temperature files.
Stop the sample environment servers:
Unit test
Run all Rust unit tests from the workspace root:
The core unit tests exercise agents, models, probability distributions, memory buffers, selectors, and the FFI wrapper. The Docker-based Gymnasium server is only required for the sample experiments above.
FFI
ReinforceX also provides a small Foreign Function Interface (FFI) crate for embedding agents from external runtimes such as C, C++, C#, or Unity.
Build the dynamic library:
The generated library is named reinforcex with the platform-specific dynamic
library extension, for example reinforcex.dll, libreinforcex.so, or
libreinforcex.dylib.
Overview
- All agents are managed internally and referenced through a
u64ID. - The public FFI functions catch panics and return silently on invalid inputs.
- All sizes use
u64for ABI-friendly boundaries. - The caller owns input and output buffer allocation.
agent_type = 0creates DQN; any other value creates PPO.
Data Structures
AgentConfig
typedef struct AgentConfig;
| Field | Description |
|---|---|
agent_type |
0 = DQN, otherwise PPO |
obs_size |
Observation vector size |
action_size |
Action space size |
learning_rate |
Optimizer learning rate |
gamma |
Discount factor |
batch_size |
DQN batch size |
buffer_size |
DQN replay buffer size |
epsilon_start |
Initial epsilon for DQN |
epsilon_end |
Final epsilon for DQN |
epsilon_decay |
Epsilon decay steps for DQN |
lambda |
PPO GAE lambda |
update_interval |
PPO update interval |
epoch |
PPO training epochs |
minibatch_size |
PPO minibatch size |
clip_eps |
PPO clipping epsilon |
Functions
rx_agent_create
uint64_t ;
Creates a new agent and returns its ID. Returns 0 on failure.
rx_agent_act_and_train
void ;
Performs action selection and one training step. DQN writes one scalar action.
PPO writes a vector action and truncates to out_len if the output buffer is
smaller than the action tensor.
rx_agent_stop_episode
void ;
Signals the end of an episode and performs the final training step.
rx_agent_destroy
void ;
Destroys the agent for the given ID. Calling it with an unknown ID is a no-op.
Contributing
ReinforceX is a good place to contribute if you are interested in Rust, reinforcement learning, libtorch bindings, simulator integration, or FFI.
Useful contribution areas:
- algorithm implementations and correctness tests;
- benchmark scripts and reproducible training results;
- safer public APIs around tensor shapes, device placement, and errors;
- documentation for model construction and environment integration;
- CI for Rust tests, formatting, and platform-specific FFI builds.
Before opening a pull request, please run:
License
MIT License (https://github.com/kakky-hacker/reinforcex/blob/master/LICENSE)