pub struct MDP {
pub num_states: usize,
pub num_actions: usize,
pub transitions: Vec<Vec<Vec<f64>>>,
pub rewards: Vec<Vec<f64>>,
pub discount: f64,
}Expand description
A finite Markov Decision Process (S, A, P, R, γ).
num_states: |S|num_actions: |A|transitions[s][a][s']: P(s’ | s, a)rewards[s][a]: R(s, a)discount: γ ∈ [0,1)
Fields§
§num_states: usizeNumber of states.
num_actions: usizeNumber of actions.
transitions: Vec<Vec<Vec<f64>>>Transition probabilities: transitions[s][a] is a probability vector over next states.
rewards: Vec<Vec<f64>>Expected reward: rewards[s][a].
discount: f64Discount factor γ ∈ [0,1).
Implementations§
Source§impl MDP
impl MDP
Sourcepub fn new(
num_states: usize,
num_actions: usize,
transitions: Vec<Vec<Vec<f64>>>,
rewards: Vec<Vec<f64>>,
discount: f64,
) -> Self
pub fn new( num_states: usize, num_actions: usize, transitions: Vec<Vec<Vec<f64>>>, rewards: Vec<Vec<f64>>, discount: f64, ) -> Self
Construct a new MDP.
Sourcepub fn bellman_operator(&self, v: &[f64]) -> Vec<f64>
pub fn bellman_operator(&self, v: &[f64]) -> Vec<f64>
Apply the Bellman operator: (TV)(s) = max_a [R(s,a) + γ Σ P(s’|s,a) V(s’)].
Sourcepub fn value_iteration(&self, tol: f64, max_iter: usize) -> Vec<f64>
pub fn value_iteration(&self, tol: f64, max_iter: usize) -> Vec<f64>
Value iteration: iterate the Bellman operator until convergence.
Sourcepub fn policy_improvement(&self, v: &[f64]) -> Vec<usize>
pub fn policy_improvement(&self, v: &[f64]) -> Vec<usize>
Extract the greedy policy from a value function.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for MDP
impl RefUnwindSafe for MDP
impl Send for MDP
impl Sync for MDP
impl Unpin for MDP
impl UnsafeUnpin for MDP
impl UnwindSafe for MDP
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more