Struct relearn::torch::critic::Gae [−][src]
Expand description
Generalized Advantage Estimator critic.
Note
Currently does not properly handle non-terminal end-of-episode.
This assumes that all episodes end with a reward of 0
.
Reference
High-Dimensional Continuous Control Using Generalized Advantage Estimation. ICLR 2016 by John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel https://arxiv.org/pdf/1506.02438.pdf
Fields
gamma: f64
Clips the environment discount factor to be no more than this.
lambda: f64
Advantage interpolation factor between one-step residuals (=0) and full return (=1).
value_fn: V
State value function module.
Trait Implementations
Auto Trait Implementations
impl<V> RefUnwindSafe for Gae<V> where
V: RefUnwindSafe,
impl<V> UnwindSafe for Gae<V> where
V: UnwindSafe,
Blanket Implementations
Mutably borrows from an owned value. Read more