Expand description

Critics for an actor-critic agent.

Structs

Reward-to-go critic. Estimates action values as the discounted sum of future rewards.

Configuration for RewardToGo

Critic using a gradient-optimized state value function module.

Configuration for ValuesOpt

Enums

Estimate baselined advantages from state value estimates and history features.

Target function for per-step selected-action value estimates.

Traits

A critic for an actor-critic agent.

Functions

Apply a state value function to HistoryFeatures::extended_observations.

Generalized advantage estimation

One-step targets of a state value function.

Discounted reward-to-go

One-step temporal difference residuals of a state value function