pub enum AdjustLrFn {
Original,
MatchRmsAdamW,
}Expand description
Learning rate adjustment method for Muon optimizer.
Muon adjusts the learning rate based on parameter shape to maintain consistent RMS across rectangular matrices.
§References
- Original: Muon: An optimizer for hidden layers
- Moonshot: Muon is Scalable for LLM Training
Variants§
Original
Keller Jordan’s original method: lr * sqrt(max(1, A/B))
This scales the learning rate based on the aspect ratio of the weight matrix, ensuring that tall matrices (more rows than columns) get proportionally larger learning rates.
§Example
For a [1024, 512] matrix: lr * sqrt(1024/512) = lr * 1.414
MatchRmsAdamW
Moonshot’s method: lr * 0.2 * sqrt(max(A, B))
This method is designed to match AdamW’s RMS, allowing Muon to directly reuse learning rates and weight decay values tuned for AdamW without retuning.
§Example
For a [1024, 512] matrix: lr * 0.2 * sqrt(1024) = lr * 6.4
Trait Implementations§
Source§impl Clone for AdjustLrFn
impl Clone for AdjustLrFn
Source§fn clone(&self) -> AdjustLrFn
fn clone(&self) -> AdjustLrFn
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for AdjustLrFn
impl Debug for AdjustLrFn
Source§impl Default for AdjustLrFn
impl Default for AdjustLrFn
Source§fn default() -> AdjustLrFn
fn default() -> AdjustLrFn
Returns the “default value” for a type. Read more
Source§impl<'de> Deserialize<'de> for AdjustLrFn
impl<'de> Deserialize<'de> for AdjustLrFn
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
Source§impl PartialEq for AdjustLrFn
impl PartialEq for AdjustLrFn
Source§impl Serialize for AdjustLrFn
impl Serialize for AdjustLrFn
impl Copy for AdjustLrFn
impl Eq for AdjustLrFn
impl StructuralPartialEq for AdjustLrFn
Auto Trait Implementations§
impl Freeze for AdjustLrFn
impl RefUnwindSafe for AdjustLrFn
impl Send for AdjustLrFn
impl Sync for AdjustLrFn
impl Unpin for AdjustLrFn
impl UnwindSafe for AdjustLrFn
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
Compare self to
key and return true if they are equal.