Expand description
RLX training-step optimizers.
Host-side f32 step functions for the families surveyed in
“A Systematic Review of Optimization Algorithms for Modern Deep
Learning” (arXiv:2509.02046v1). Each algorithm exposes a small
state struct keyed by parameter name (so the same struct holds
moments for every tensor in a model) and a step method that
consumes (name, shape, &mut params, &grads).
The API is deliberately minimal: it operates on flat &mut [f32]
/ &[f32] slices plus a &[usize] shape — matching the
rlx_umap::adam pattern. Backends
that already ship a fused step kernel (see e.g.
rlx_metal::splat_adam) are free to bypass this crate for their
hot path; this crate is the portable reference / CPU fallback / the
one used when there is no backend fused kernel for the requested
algorithm.
§Algorithms
| Family | Type |
|---|---|
Sgd | SGD ± momentum / Nesterov |
Adam | Adam |
AdamW | AdamW (decoupled decay) |
NAdamW | Nesterov AdamW |
RAdam | Rectified Adam |
QHAdamW | Quasi-hyperbolic AdamW |
Lamb | LAMB (layer-wise adaptive) |
Adafactor | Adafactor (factored 2nd mom.) |
Lion | Lion (sign of EMA) |
Soap | SOAP (Shampoo-in-Adam-basis) |
KronPsgd | Kron / PSGD |
Muon | Muon (Newton–Schulz orth.) |
Sophia | Sophia-H |
Mars | MARS (variance-reduced) |
Structs§
- Adafactor
- Adafactor — factored-second-moment optimizer.
- Adam
- Bias-corrected first/second moment optimizer.
- AdamW
- Adam with decoupled weight decay.
- Kron
Psgd - Kron-PSGD — Kronecker-factored preconditioned SGD.
- Lamb
- Layer-wise Adaptive Moments for Batch training.
- Lion
- EvoLved sign-momentum optimizer.
- Mars
- MARS — variance-reduced AdamW. Per-tensor state: three
f32buffers (m,v, previous-gradient cache). - Muon
- Muon — Momentum-Orthogonalized-by-Newton-Schulz.
- NAdamW
- Nesterov AdamW. Per-tensor state: two
f32buffers. - QHAdamW
- Quasi-hyperbolic AdamW. Per-tensor state: two
f32buffers. - RAdam
- Rectified Adam. Per-tensor state: two
f32buffers. - Sgd
- SGD with momentum / Nesterov / L2 weight decay.
- Soap
- SOAP — Shampoo-in-Adam-basis optimizer.
- Sophia
- Sophia-H — Hessian-diagonal second-order optimizer.
Traits§
- Optimizer
- Common parameter-update interface.
Functions§
- global_
grad_ clip_ scale - Global L2-norm clip across many tensors. Returns the scale factor
(
<= 1.0) to multiply every gradient by; callers can pre-scale before passing to [Optimizer::step]. Identical torlx_umap::adam::global_grad_clip_scalebut generic over any iterator yielding slices. - l2_norm
- L2 norm across a slice (skipping non-finite entries).