pub struct AdamW {
pub lr: f32,
pub beta1: f32,
pub beta2: f32,
pub eps: f32,
pub weight_decay: f32,
/* private fields */
}Expand description
Adam with decoupled weight decay.
Per-tensor state identical to crate::Adam (two f32 buffers).
Fields§
§lr: f32Learning rate. Typical LLM pre-training value: 1e-4 to 3e-4.
beta1: f32First-moment EMA decay. Default 0.9.
beta2: f32Second-moment EMA decay. Default 0.999 (matches Adam);
0.95 is common for very long pre-training runs.
eps: f32Denominator stability constant. Default 1e-8.
weight_decay: f32Decoupled weight-decay coefficient λ. Multiplies the
parameter directly inside the update; 0.01–0.1 typical.
Defaults to 0.01.
Implementations§
Source§impl AdamW
impl AdamW
Sourcepub fn new(lr: f32) -> Self
pub fn new(lr: f32) -> Self
Construct with the given learning rate and the standard (β₁, β₂, ε, λ) = (0.9, 0.999, 1e-8, 0.01) defaults.
Sourcepub fn with_betas(self, b1: f32, b2: f32) -> Self
pub fn with_betas(self, b1: f32, b2: f32) -> Self
Override (β₁, β₂).
Sourcepub fn with_weight_decay(self, wd: f32) -> Self
pub fn with_weight_decay(self, wd: f32) -> Self
Override the decoupled-decay coefficient.
Trait Implementations§
Source§impl Optimizer for AdamW
impl Optimizer for AdamW
fn step( &mut self, name: &str, _shape: &[usize], param: &mut [f32], grad: &[f32], )
Source§fn end_iteration(&mut self)
fn end_iteration(&mut self)
Advance the global step counter. Most algorithms increment per
call to [
step], so most implementations leave this a no-op.Source§fn lr_scale(&self, _name: &str) -> f32
fn lr_scale(&self, _name: &str) -> f32
Per-tensor multiplier on the effective learning rate. Default
is
1.0 for every name. Override when wrapping this crate to
support per-name LR schedules (e.g. embedding-vs-attention
splits, or the Gaussian-splat attribute-typed LR setup). The
CPU impls in this crate currently honor this only when the
caller passes a pre-scaled lr for the relevant call —
backends are encouraged to consult it inside their fused
kernel.Auto Trait Implementations§
impl Freeze for AdamW
impl RefUnwindSafe for AdamW
impl Send for AdamW
impl Sync for AdamW
impl Unpin for AdamW
impl UnsafeUnpin for AdamW
impl UnwindSafe for AdamW
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more