Crate relayrl_algorithms

Expand description

RelayRL learning algorithms and a small trainer façade for constructing them.

§Layout

PpoTrainer — independent PPO (and the IPPO naming alias for the same type).
ReinforceTrainer — independent REINFORCE (and IREINFORCE alias).
MultiagentTrainer — MAPPO / MAREINFORCE; no external step kernel type parameter.

Use RelayRLTrainer as a convenience namespace with the same constructors as those types.

After construction, drive training through AlgorithmTrait: pass trajectories whose type implements TrajectoryData (for example RelayRL, CSV, or Arrow trajectory wrappers from relayrl_types), then call receive_trajectory, train_model, log_epoch, and save as your integration loop requires.

§Re-exports

This module re-exports algorithm structs and hyperparameter types (PPOParams, MAPPOParams, …) plus kernel traits (PPOKernelTrait, StepKernelTrait, TrainableKernelTrait) so callers can supply custom kernels without digging through submodule paths.

§Generics

B, InK, and OutK are the Burn backend and tensor kinds your environment uses. Independent trainers also take a kernel type K. If the compiler cannot infer them, use a turbofish on the constructor, e.g. RelayRLTrainer::mappo::<B, InK, OutK>(args, None)?, or give the result a concrete type in a let binding.

§Examples in this file

Fenced examples are illustrative (ignore): substitute your real B, InK, OutK, kernel K, and async runtime. They are not run as doctests by default so environments without optional backends (for example libtorch) still build docs cleanly.

§End-to-end training flow

use std::path::PathBuf;

use relayrl_algorithms::{AlgorithmError, AlgorithmTrait, RelayRLTrainer, TrainerArgs};
use relayrl_types::prelude::trajectory::RelayRLTrajectory;

async fn run_training_loop<B, InK, OutK>() -> Result<(), AlgorithmError> {
    let args = TrainerArgs {
        env_dir: PathBuf::from("./env"),
        save_model_path: PathBuf::from("./checkpoints"),
        obs_dim: 64,
        act_dim: 8,
        buffer_size: 1_000_000,
    };

    let mut trainer = RelayRLTrainer::mappo::<B, InK, OutK>(args, None)?;

    let mut trajectory = RelayRLTrajectory::new(1024);
    // Populate `trajectory` from your environment loop before handing it to the trainer.

    trainer.receive_trajectory(trajectory).await?;
    trainer.train_model();
    trainer.log_epoch();
    trainer.save("epoch-0001");

    Ok(())
}

Re-exports§

pub use algorithms::PPO::IPPOAlgorithm;
pub use algorithms::PPO::IPPOParams;
pub use algorithms::PPO::MAPPOAlgorithm;
pub use algorithms::PPO::MAPPOParams;
pub use algorithms::PPO::PPOAlgorithm;
pub use algorithms::PPO::PPOKernelTrait;
pub use algorithms::PPO::PPOParams;
pub use algorithms::REINFORCE::IREINFORCEAlgorithm;
pub use algorithms::REINFORCE::IREINFORCEParams;
pub use algorithms::REINFORCE::MAREINFORCEAlgorithm;
pub use algorithms::REINFORCE::MAREINFORCEParams;
pub use algorithms::REINFORCE::REINFORCEParams;
pub use algorithms::REINFORCE::ReinforceAlgorithm;
pub use templates::base_algorithm::AlgorithmError;
pub use templates::base_algorithm::AlgorithmTrait;
pub use templates::base_algorithm::StepKernelTrait;
pub use templates::base_algorithm::TrainableKernelTrait;
pub use templates::base_algorithm::TrajectoryData;
pub use templates::base_algorithm::WeightProvider;

Modules§

algorithms
logging
templates

Structs§

RelayRLTrainer: Namespace type with static constructors for each trainer family.
TrainerArgs: Shared filesystem and shape arguments for every trainer constructor in this module.

Enums§

MultiagentTrainer: Runtime wrapper for multi-agent MAPPO and MAREINFORCE.
MultiagentTrainerSpec: Describes which multi-agent trainer to build (MAPPO or MAREINFORCE).
PpoTrainer: Runtime wrapper for independent PPO-family algorithms, parameterized by your step kernel K.
PpoTrainerSpec: Describes which independent PPO trainer to build, before you supply a kernel K.
ReinforceTrainer: Runtime wrapper for independent REINFORCE-family algorithms with kernel K.
ReinforceTrainerSpec: Describes which independent REINFORCE trainer to build, before you supply a kernel K.