pub struct QLearningSolver {
pub q: Vec<Vec<f64>>,
pub alpha: f64,
pub gamma: f64,
pub epsilon: f64,
pub visit_count: Vec<Vec<u64>>,
}Expand description
Q-learning solver with epsilon-greedy exploration and decaying step size.
Fields§
§q: Vec<Vec<f64>>Q-value table Q[s][a].
alpha: f64Learning rate α.
gamma: f64Discount factor γ.
epsilon: f64Exploration rate ε.
visit_count: Vec<Vec<u64>>Step counter per (s,a) pair (for step-size decay).
Implementations§
Source§impl QLearningSolver
impl QLearningSolver
Sourcepub fn new(
num_states: usize,
num_actions: usize,
alpha: f64,
gamma: f64,
epsilon: f64,
) -> Self
pub fn new( num_states: usize, num_actions: usize, alpha: f64, gamma: f64, epsilon: f64, ) -> Self
Construct a Q-learning solver.
Sourcepub fn update(&mut self, s: usize, a: usize, r: f64, s_next: usize)
pub fn update(&mut self, s: usize, a: usize, r: f64, s_next: usize)
Perform a Q-learning update with harmonic step size 1/(1 + n(s,a)).
Sourcepub fn select_action(&self, s: usize, rng_val: f64) -> usize
pub fn select_action(&self, s: usize, rng_val: f64) -> usize
Select an action using epsilon-greedy policy (deterministic tie-breaking).
rng_val ∈ [0,1) is a uniform random value supplied by the caller.
Sourcepub fn has_converged(&self, prev_q: &[Vec<f64>], tol: f64) -> bool
pub fn has_converged(&self, prev_q: &[Vec<f64>], tol: f64) -> bool
Check convergence: max |Q(s,a) - Q_prev(s,a)| < tol.
Sourcepub fn greedy_policy(&self) -> Vec<usize>
pub fn greedy_policy(&self) -> Vec<usize>
Return the current greedy policy.
Trait Implementations§
Source§impl Clone for QLearningSolver
impl Clone for QLearningSolver
Source§fn clone(&self) -> QLearningSolver
fn clone(&self) -> QLearningSolver
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for QLearningSolver
impl RefUnwindSafe for QLearningSolver
impl Send for QLearningSolver
impl Sync for QLearningSolver
impl Unpin for QLearningSolver
impl UnsafeUnpin for QLearningSolver
impl UnwindSafe for QLearningSolver
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more