pub struct RewardNormalizer { /* private fields */ }Expand description
Reward normalizer with optional return-based normalisation and clipping.
Implementations§
Source§impl RewardNormalizer
impl RewardNormalizer
Sourcepub fn new(n_envs: usize, gamma: f32, clip: f32, mode: RewardNormMode) -> Self
pub fn new(n_envs: usize, gamma: f32, clip: f32, mode: RewardNormMode) -> Self
Create a reward normalizer.
n_envs— number of parallel environments.gamma— discount factor for return accumulation.clip— symmetric clip range.mode— normalisation mode.
Sourcepub fn process(&mut self, rewards: &[f32], dones: &[f32]) -> Vec<f32>
pub fn process(&mut self, rewards: &[f32], dones: &[f32]) -> Vec<f32>
Process rewards for a step across n_envs parallel environments.
Updates the running return estimates and normalizes.
Returns the normalised / clipped rewards.
§Panics
Panics if rewards.len() != n_envs or dones.len() != n_envs.
Sourcepub fn normalise_eval(&self, rewards: &[f32]) -> Vec<f32>
pub fn normalise_eval(&self, rewards: &[f32]) -> Vec<f32>
Normalise a batch of rewards without updating running statistics (evaluation).
Sourcepub fn reset_returns(&mut self)
pub fn reset_returns(&mut self)
Reset running returns (call at episode start).
Trait Implementations§
Source§impl Clone for RewardNormalizer
impl Clone for RewardNormalizer
Source§fn clone(&self) -> RewardNormalizer
fn clone(&self) -> RewardNormalizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for RewardNormalizer
impl RefUnwindSafe for RewardNormalizer
impl Send for RewardNormalizer
impl Sync for RewardNormalizer
impl Unpin for RewardNormalizer
impl UnsafeUnpin for RewardNormalizer
impl UnwindSafe for RewardNormalizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more