oxicuda-rl
Part of the OxiCUDA ecosystem — Pure Rust CUDA replacement for the COOLJAPAN ecosystem.
Overview
oxicuda-rl provides a comprehensive set of GPU-ready reinforcement learning building blocks, covering experience replay buffers, policy distributions, return/advantage estimators, RL algorithm loss functions, and environment abstractions. All components are designed for on-device operation to minimise host-device memory traffic, with PTX kernel sources for GPU-accelerated RL operations.
Features
- Replay Buffers — Uniform replay (DQN, SAC, TD3), Prioritized Experience Replay with segment-tree PER and IS weight computation (PER-DQN, PER-SAC), and N-step return accumulation
- Policy Distributions — Categorical (discrete actions, Gumbel-max sampling), Gaussian (continuous actions with reparameterisation trick and optional Tanh squashing for SAC), and Deterministic (DDPG/TD3 with Ornstein-Uhlenbeck noise)
- Return & Advantage Estimators — GAE (PPO/A3C), TD(λ), V-trace off-policy correction (IMPALA), and Retrace(λ) safe off-policy Q-targets
- Loss Functions — PPO clip+value+entropy, DQN/Double-DQN Bellman MSE/Huber, SAC soft Q + actor losses with automatic temperature tuning, TD3 twin-Q critic and deterministic actor losses
- Normalization — Running mean/variance observation normalization, return-based reward normalization, Welford online statistics
- Environment Abstractions —
Envtrait,VecEnvvectorized wrapper with auto-reset,LinearQuadraticEnvreference environment - PTX Kernels — GPU PTX source strings for TD-error, PPO ratio, SAC target, PER IS weight computation, and advantage normalization
Usage
Add to your Cargo.toml:
[]
= "0.1.2"
use UniformReplayBuffer;
use ;
use ;
use RlHandle;
// Set up replay buffer
let mut buf = new;
let mut handle = default_handle;
// Push experience and sample a mini-batch
for i in 0..100_usize
let batch = buf.sample.unwrap;
// Compute GAE for a 5-step rollout
let rewards = vec!;
let values = vec!;
let next_vals = vec!;
let dones = vec!;
let gae = compute_gae.unwrap;
assert_eq!;
License
Apache-2.0 — © 2026 COOLJAPAN OU (Team KitaSan)