Expand description
§OxiCUDA-RL — GPU-Accelerated Reinforcement Learning Primitives (Vol.9)
oxicuda-rl provides a comprehensive set of GPU-ready RL building blocks:
§Replay Buffers
buffer::UniformReplayBuffer— fixed-capacity circular buffer with uniform random sampling (DQN, SAC, TD3).buffer::PrioritizedReplayBuffer— segment-tree PER with IS weight computation (PER-DQN, PER-SAC).buffer::NStepBuffer— n-step return accumulation with configurable discount and episode-boundary handling.
§Policy Distributions
policy::CategoricalPolicy— discrete actions with Gumbel-max sampling, log-probability, entropy, KL-divergence.policy::GaussianPolicy— diagonal Gaussian for continuous actions with reparameterisation trick and optional Tanh squashing (SAC).policy::DeterministicPolicy— DDPG/TD3 with exploration noise and TD3 target policy smoothing.
§Return / Advantage Estimators
estimator::compute_gae— GAE advantages and value targets (PPO, A3C).estimator::compute_td_lambda— TD(λ) multi-step returns.estimator::compute_vtrace— V-trace off-policy correction (IMPALA).estimator::compute_retrace— Retrace(λ) safe off-policy Q-targets.
§Loss Functions
loss::ppo_loss— PPO clip + value + entropy combined loss.loss::dqn_loss/loss::double_dqn_loss— Bellman MSE / Huber.loss::sac_critic_loss/loss::sac_actor_loss— SAC soft Q and policy losses with automatic temperature tuning.loss::td3_critic_loss/loss::td3_actor_loss— TD3 twin-Q critic and deterministic actor losses.
§Normalization
normalize::ObservationNormalizer— running mean/variance with clip.normalize::RewardNormalizer— return-based or clip normalization.normalize::RunningStats— Welford online statistics tracker.
§Environment Abstractions
env::Env— standard RL environment trait (reset,step).env::VecEnv— vectorized multi-environment wrapper with auto-reset.env::env::LinearQuadraticEnv— reference LQ environment for testing.
§PTX Kernels
ptx_kernels— GPU PTX source strings for TD-error, PPO ratio, SAC target, PER IS weight computation, and advantage normalisation.
§Quick Start
use oxicuda_rl::buffer::UniformReplayBuffer;
use oxicuda_rl::policy::CategoricalPolicy;
use oxicuda_rl::estimator::{GaeConfig, compute_gae};
use oxicuda_rl::loss::{PpoConfig, ppo_loss};
use oxicuda_rl::handle::RlHandle;
// Set up replay buffer
let mut buf = UniformReplayBuffer::new(10_000, 8, 4);
let mut handle = RlHandle::default_handle();
// Push some experience
for i in 0..100_usize {
buf.push(
vec![i as f32; 8],
vec![0.0_f32; 4],
1.0,
vec![i as f32 + 1.0; 8],
false,
);
}
// Sample a mini-batch
let batch = buf.sample(32, &mut handle).unwrap();
assert_eq!(batch.len(), 32);
// Compute GAE for a 5-step rollout
let rewards = vec![1.0_f32; 5];
let values = vec![0.5_f32; 5];
let next_vals = vec![0.5_f32; 5];
let dones = vec![0.0_f32; 5];
let gae = compute_gae(&rewards, &values, &next_vals, &dones, GaeConfig::default()).unwrap();
assert_eq!(gae.advantages.len(), 5);(C) 2026 COOLJAPAN OU (Team KitaSan)
Re-exports§
Modules§
- buffer
- Experience replay buffers. Experience replay buffers for off-policy RL algorithms.
- env
- Environment abstractions. Environment abstractions for OxiCUDA-RL.
- error
- Error types and result alias.
Error types for
oxicuda-rl. - estimator
- Return and advantage estimators. Return and advantage estimators for on-policy and off-policy RL algorithms.
- handle
- RL session handle: SM version, device info, seeded RNG.
- loss
- RL algorithm loss functions. RL algorithm loss functions.
- normalize
- Observation and reward normalization. Online normalization utilities for observations and rewards.
- policy
- Policy distributions for discrete and continuous action spaces. Policy distributions for discrete and continuous action spaces.
- prelude
- Convenience prelude: imports the most commonly used types.
- ptx_
kernels - PTX kernel sources for GPU-accelerated RL operations. PTX kernel sources for GPU-accelerated RL operations.