pub struct MdpSolver {
pub num_states: usize,
pub num_actions: usize,
pub discount: f64,
pub rewards: Vec<Vec<f64>>,
pub transitions: Vec<Vec<Vec<f64>>>,
}Expand description
Discrete-state, discrete-action MDP solver.
Implements value iteration and policy iteration for discounted MDPs.
Fields§
§num_states: usizeNumber of states.
num_actions: usizeNumber of actions.
discount: f64Discount factor γ ∈ [0, 1).
rewards: Vec<Vec<f64>>Reward matrix R[s][a].
transitions: Vec<Vec<Vec<f64>>>Transition probabilities P[s][a][s’].
Implementations§
Source§impl MdpSolver
impl MdpSolver
Sourcepub fn new(
num_states: usize,
num_actions: usize,
discount: f64,
rewards: Vec<Vec<f64>>,
transitions: Vec<Vec<Vec<f64>>>,
) -> Self
pub fn new( num_states: usize, num_actions: usize, discount: f64, rewards: Vec<Vec<f64>>, transitions: Vec<Vec<Vec<f64>>>, ) -> Self
Create a new MDP solver.
Sourcepub fn bellman_update(&self, v: &[f64]) -> Vec<f64>
pub fn bellman_update(&self, v: &[f64]) -> Vec<f64>
Apply the Bellman operator once: T V(s) = max_a [R(s,a) + γ Σ P(s’|s,a) V(s’)]
Sourcepub fn value_iteration(&self, tol: f64, max_iter: usize) -> (Vec<f64>, usize)
pub fn value_iteration(&self, tol: f64, max_iter: usize) -> (Vec<f64>, usize)
Run value iteration until convergence.
Returns (V*, iterations) where V*[s] is the optimal value of state s.
Sourcepub fn extract_policy(&self, v: &[f64]) -> Vec<usize>
pub fn extract_policy(&self, v: &[f64]) -> Vec<usize>
Extract greedy policy from value function V.
Returns policy[s] = argmax_a [R(s,a) + γ Σ P(s’|s,a) V(s’)].
Auto Trait Implementations§
impl Freeze for MdpSolver
impl RefUnwindSafe for MdpSolver
impl Send for MdpSolver
impl Sync for MdpSolver
impl Unpin for MdpSolver
impl UnsafeUnpin for MdpSolver
impl UnwindSafe for MdpSolver
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more