Expand description
Critics for an actor-critic agent.
Structs
Reward-to-go critic. Estimates action values as the discounted sum of future rewards.
Configuration for RewardToGo
Critic using a gradient-optimized state value function module.
Configuration for ValuesOpt
Enums
Estimate baselined advantages from state value estimates and history features.
Target function for per-step selected-action value estimates.
Traits
Build a Critic
.
A critic for an actor-critic agent.
Functions
Apply a state value function to HistoryFeatures::extended_observations
.
Generalized advantage estimation
One-step targets of a state value function.
Discounted reward-to-go
One-step temporal difference residuals of a state value function