Module rsrl::control::td

source ·

Structs

Action probability-weighted variant of SARSA (aka “summation Q-learning”).
Persistent Advantage Learning
Watkins’ Q-learning with eligibility traces.
Watkins’ Q-learning.
General multi-step temporal-difference learning algorithm.
On-policy variant of Watkins’ Q-learning (aka “modified Q-learning”).
On-policy variant of Watkins’ Q-learning with eligibility traces (aka “modified Q-learning”).