Module relearn::torch::agents::critics

Expand description

Critics for an actor-critic agent.

Structs

Reward-to-go critic. Estimates action values as the discounted sum of future rewards.

Configuration for RewardToGo

Critic using a gradient-optimized state value function module.

Configuration for ValuesOpt

Estimate baselined advantages from state value estimates and history features.

Target function for per-step selected-action value estimates.

Build a Critic.

Apply a state value function to HistoryFeatures::extended_observations.

Generalized advantage estimation

One-step targets of a state value function.

Discounted reward-to-go

One-step temporal difference residuals of a state value function