Skip to main content

Crate oxicuda_rl

Crate oxicuda_rl 

Source
Expand description

§OxiCUDA-RL — GPU-Accelerated Reinforcement Learning Primitives (Vol.9)

oxicuda-rl provides a comprehensive set of GPU-ready RL building blocks:

§Replay Buffers

§Policy Distributions

§Return / Advantage Estimators

§Loss Functions

§Normalization

§Environment Abstractions

§PTX Kernels

  • ptx_kernels — GPU PTX source strings for TD-error, PPO ratio, SAC target, PER IS weight computation, and advantage normalisation.

§Quick Start

use oxicuda_rl::buffer::UniformReplayBuffer;
use oxicuda_rl::policy::CategoricalPolicy;
use oxicuda_rl::estimator::{GaeConfig, compute_gae};
use oxicuda_rl::loss::{PpoConfig, ppo_loss};
use oxicuda_rl::handle::RlHandle;

// Set up replay buffer
let mut buf = UniformReplayBuffer::new(10_000, 8, 4);
let mut handle = RlHandle::default_handle();

// Push some experience
for i in 0..100_usize {
    buf.push(
        vec![i as f32; 8],
        vec![0.0_f32; 4],
        1.0,
        vec![i as f32 + 1.0; 8],
        false,
    );
}

// Sample a mini-batch
let batch = buf.sample(32, &mut handle).unwrap();
assert_eq!(batch.len(), 32);

// Compute GAE for a 5-step rollout
let rewards    = vec![1.0_f32; 5];
let values     = vec![0.5_f32; 5];
let next_vals  = vec![0.5_f32; 5];
let dones      = vec![0.0_f32; 5];
let gae = compute_gae(&rewards, &values, &next_vals, &dones, GaeConfig::default()).unwrap();
assert_eq!(gae.advantages.len(), 5);

(C) 2026 COOLJAPAN OU (Team KitaSan)

Re-exports§

pub use error::RlError;
pub use error::RlResult;

Modules§

buffer
Experience replay buffers. Experience replay buffers for off-policy RL algorithms.
env
Environment abstractions. Environment abstractions for OxiCUDA-RL.
error
Error types and result alias. Error types for oxicuda-rl.
estimator
Return and advantage estimators. Return and advantage estimators for on-policy and off-policy RL algorithms.
handle
RL session handle: SM version, device info, seeded RNG.
loss
RL algorithm loss functions. RL algorithm loss functions.
normalize
Observation and reward normalization. Online normalization utilities for observations and rewards.
policy
Policy distributions for discrete and continuous action spaces. Policy distributions for discrete and continuous action spaces.
prelude
Convenience prelude: imports the most commonly used types.
ptx_kernels
PTX kernel sources for GPU-accelerated RL operations. PTX kernel sources for GPU-accelerated RL operations.