pub struct NAdamW {
pub lr: f32,
pub beta1: f32,
pub beta2: f32,
pub eps: f32,
pub weight_decay: f32,
/* private fields */
}Expand description
Nesterov AdamW. Per-tensor state: two f32 buffers.
Fields§
§lr: f32Learning rate.
beta1: f32First-moment EMA decay β₁. Default 0.9.
beta2: f32Second-moment EMA decay β₂. Default 0.999.
eps: f32Denominator stability constant. Default 1e-8.
weight_decay: f32Decoupled weight-decay coefficient λ. Default 0.01.
Implementations§
Trait Implementations§
Source§impl Optimizer for NAdamW
impl Optimizer for NAdamW
fn step( &mut self, name: &str, _shape: &[usize], param: &mut [f32], grad: &[f32], )
Source§fn end_iteration(&mut self)
fn end_iteration(&mut self)
Advance the global step counter. Most algorithms increment per
call to [
step], so most implementations leave this a no-op.Source§fn lr_scale(&self, _name: &str) -> f32
fn lr_scale(&self, _name: &str) -> f32
Per-tensor multiplier on the effective learning rate. Default
is
1.0 for every name. Override when wrapping this crate to
support per-name LR schedules (e.g. embedding-vs-attention
splits, or the Gaussian-splat attribute-typed LR setup). The
CPU impls in this crate currently honor this only when the
caller passes a pre-scaled lr for the relevant call —
backends are encouraged to consult it inside their fused
kernel.Auto Trait Implementations§
impl Freeze for NAdamW
impl RefUnwindSafe for NAdamW
impl Send for NAdamW
impl Sync for NAdamW
impl Unpin for NAdamW
impl UnsafeUnpin for NAdamW
impl UnwindSafe for NAdamW
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more