Struct QLearning

Source

pub struct QLearning {
    pub q_table: Array2<f64>,
    pub alpha: f64,
    pub epsilon: f64,
    pub gamma: f64,
}

Expand description

Tabular Q-learning agent (model-free, off-policy TD).

Fields§

§q_table: Array2<f64>

Q-value table (n_states × n_actions).

§alpha: f64

Learning rate α ∈ (0, 1].

§epsilon: f64

ε-greedy exploration probability.

§gamma: f64

Discount factor γ.

Implementations§

Source §

impl QLearning

Source

pub fn new( n_states: usize, n_actions: usize, alpha: f64, epsilon: f64, gamma: f64, ) -> Self

Create a new Q-learning agent with zero-initialised Q-table.

Source

pub fn update( &mut self, state: usize, action: usize, reward: f64, next_state: usize, )

Apply a single Q-learning update.

Q(s,a) ← Q(s,a) + α [ r + γ max_{a'} Q(s',a') − Q(s,a) ]

Source

pub fn epsilon_greedy(&self, state: usize, rng_seed: u64) -> usize

Select an action via ε-greedy policy (deterministic given rng_seed).

Source

pub fn greedy(&self, state: usize) -> usize

Select the greedy action (no exploration).

Source

pub fn train( &mut self, mdp: &Mdp, n_episodes: usize, max_steps_per_episode: usize, seed: u64, ) -> Result<Vec<f64>, OptimizeError>

Train Q-learning on a known MDP for n_episodes episodes.

Returns episode discounted returns.

Source

pub fn policy(&self) -> Vec<usize>

Extract the greedy policy from Q-table.

Source

pub fn value_function(&self) -> Vec<f64>

Estimate the value function: V(s) = max_a Q(s,a).

Trait Implementations§

Source §

impl Clone for QLearning

Source §

fn clone(&self) -> QLearning

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for QLearning

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl UnwindSafe for QLearning

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §