pub struct PolicyGradient {
pub theta: Vec<Vec<f64>>,
pub alpha: f64,
pub gamma: f64,
}Expand description
Policy gradient agent (softmax parameterisation).
Policy: π_θ(a|s) = exp(θ[s][a]) / Σ exp(θ[s][a’]) Update: θ[s][a] += α · ∇_θ log π_θ(a|s) · G where G is the return.
Fields§
§theta: Vec<Vec<f64>>Policy parameter table θ[s][a].
alpha: f64Learning rate α.
gamma: f64Discount factor γ.
Implementations§
Source§impl PolicyGradient
impl PolicyGradient
Sourcepub fn new(
num_states: usize,
num_actions: usize,
alpha: f64,
gamma: f64,
) -> Self
pub fn new( num_states: usize, num_actions: usize, alpha: f64, gamma: f64, ) -> Self
Construct a policy gradient agent.
Sourcepub fn update(&mut self, s: usize, a: usize, g: f64)
pub fn update(&mut self, s: usize, a: usize, g: f64)
Update θ using a single (s, a, G) sample from a trajectory.
Sourcepub fn expected_return(&self, s: usize, q: &ActionValueFunction) -> f64
pub fn expected_return(&self, s: usize, q: &ActionValueFunction) -> f64
Expected return from state s as E_π[Q(s,a)].
Sourcepub fn convergence_rate(&self, q: &ActionValueFunction) -> f64
pub fn convergence_rate(&self, q: &ActionValueFunction) -> f64
Convergence rate estimate: max |∇J| across states (gradient norm).
Trait Implementations§
Source§impl Clone for PolicyGradient
impl Clone for PolicyGradient
Source§fn clone(&self) -> PolicyGradient
fn clone(&self) -> PolicyGradient
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for PolicyGradient
impl RefUnwindSafe for PolicyGradient
impl Send for PolicyGradient
impl Sync for PolicyGradient
impl Unpin for PolicyGradient
impl UnsafeUnpin for PolicyGradient
impl UnwindSafe for PolicyGradient
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more