Expand description
Tabular MDP algorithms: value/policy iteration, Q-learning, SARSA.
All algorithms operate on finite MDPs represented by explicit transition and reward matrices.
Structs§
- Mdp
- A finite Markov Decision Process.
- MdpSolution
- Solution returned by MDP solvers.
- QLearning
- Tabular Q-learning agent (model-free, off-policy TD).
- Sarsa
- Tabular SARSA agent (on-policy TD learning).
Functions§
- evaluate_
policy - Evaluate a fixed deterministic policy iteratively.
- lp_
solve_ mdp - Solve an MDP via its Linear Programming formulation.
- modified_
policy_ iteration - Modified Policy Iteration (k-step partial evaluation).
- policy_
iteration - Policy Iteration.
- simulate
- Simulate an MDP with a fixed deterministic policy.
- value_
iteration - Value Iteration.