Expand description

Policies for an actor-critic agent.

Structs

An Actor that samples actions according to a policy module.

Proximal Policy Optimization (PPO) with a clipped objective.

Configuration for Ppo

REINFORCE policy gradient

Configuration for Reinforce

Trust Region Policy Optimization (TRPO) with a clipped objective.

Configuration for Trpo

Traits