pub struct QLearning {
pub q_table: Array2<f64>,
pub alpha: f64,
pub epsilon: f64,
pub gamma: f64,
}Expand description
Tabular Q-learning agent (model-free, off-policy TD).
Fields§
§q_table: Array2<f64>Q-value table (n_states × n_actions).
alpha: f64Learning rate α ∈ (0, 1].
epsilon: f64ε-greedy exploration probability.
gamma: f64Discount factor γ.
Implementations§
Source§impl QLearning
impl QLearning
Sourcepub fn new(
n_states: usize,
n_actions: usize,
alpha: f64,
epsilon: f64,
gamma: f64,
) -> Self
pub fn new( n_states: usize, n_actions: usize, alpha: f64, epsilon: f64, gamma: f64, ) -> Self
Create a new Q-learning agent with zero-initialised Q-table.
Sourcepub fn update(
&mut self,
state: usize,
action: usize,
reward: f64,
next_state: usize,
)
pub fn update( &mut self, state: usize, action: usize, reward: f64, next_state: usize, )
Apply a single Q-learning update.
Q(s,a) ← Q(s,a) + α [ r + γ max_{a'} Q(s',a') − Q(s,a) ]
Sourcepub fn epsilon_greedy(&self, state: usize, rng_seed: u64) -> usize
pub fn epsilon_greedy(&self, state: usize, rng_seed: u64) -> usize
Select an action via ε-greedy policy (deterministic given rng_seed).
Sourcepub fn train(
&mut self,
mdp: &Mdp,
n_episodes: usize,
max_steps_per_episode: usize,
seed: u64,
) -> Result<Vec<f64>, OptimizeError>
pub fn train( &mut self, mdp: &Mdp, n_episodes: usize, max_steps_per_episode: usize, seed: u64, ) -> Result<Vec<f64>, OptimizeError>
Train Q-learning on a known MDP for n_episodes episodes.
Returns episode discounted returns.
Sourcepub fn value_function(&self) -> Vec<f64>
pub fn value_function(&self) -> Vec<f64>
Estimate the value function: V(s) = max_a Q(s,a).
Trait Implementations§
Auto Trait Implementations§
impl Freeze for QLearning
impl RefUnwindSafe for QLearning
impl Send for QLearning
impl Sync for QLearning
impl Unpin for QLearning
impl UnsafeUnpin for QLearning
impl UnwindSafe for QLearning
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.