Skip to main content

Module tabular

Module tabular 

Source
Expand description

Tabular MDP algorithms: value/policy iteration, Q-learning, SARSA.

All algorithms operate on finite MDPs represented by explicit transition and reward matrices.

Structs§

Mdp
A finite Markov Decision Process.
MdpSolution
Solution returned by MDP solvers.
QLearning
Tabular Q-learning agent (model-free, off-policy TD).
Sarsa
Tabular SARSA agent (on-policy TD learning).

Functions§

evaluate_policy
Evaluate a fixed deterministic policy iteratively.
lp_solve_mdp
Solve an MDP via its Linear Programming formulation.
modified_policy_iteration
Modified Policy Iteration (k-step partial evaluation).
policy_iteration
Policy Iteration.
simulate
Simulate an MDP with a fixed deterministic policy.
value_iteration
Value Iteration.