Expand description
Policies for an actor-critic agent.
Structs
An Actor
that samples actions according to a policy module.
Proximal Policy Optimization (PPO) with a clipped objective.
REINFORCE policy gradient
Configuration for Reinforce
Trust Region Policy Optimization (TRPO) with a clipped objective.
Configuration for Trpo
Traits
A policy for an actor-critic agent.