MuonConfig

burn_optim::optim

Struct MuonConfig

pub struct MuonConfig { /* private fields */ }

Expand description

Muon configuration.

Muon is an optimizer specifically designed for 2D parameters of neural network hidden layers (weight matrices). Other parameters such as biases and embeddings should be optimized using a standard method such as AdamW.

§Learning Rate Adjustment

Muon adjusts the learning rate based on parameter shape to maintain consistent RMS across rectangular matrices. Two methods are available:

Original: Uses sqrt(max(1, A/B)) where A and B are the first two dimensions. This is Keller Jordan’s method and is the default.
MatchRmsAdamW: Uses 0.2 * sqrt(max(A, B)). This is Moonshot’s method designed to match AdamW’s RMS, allowing direct reuse of AdamW hyperparameters.

§Example

use burn_optim::{MuonConfig, AdjustLrFn};

// Using default (Original) method
let optimizer = MuonConfig::new().init();

// Using MatchRmsAdamW for AdamW-compatible hyperparameters
let optimizer = MuonConfig::new()
    .with_adjust_lr_fn(AdjustLrFn::MatchRmsAdamW)
    .init();

§References

Implementations§

impl MuonConfig

pub fn new() -> Self

Create a new instance of the config.

impl MuonConfig

pub fn with_momentum(self, momentum: MomentumConfig) -> Self

Momentum config.

pub fn with_ns_coefficients(self, ns_coefficients: (f32, f32, f32)) -> Self

Newton-Schulz iteration coefficients (a, b, c).

pub fn with_epsilon(self, epsilon: f32) -> Self

Epsilon for numerical stability.

pub fn with_ns_steps(self, ns_steps: usize) -> Self

Number of Newton-Schulz iteration steps.

pub fn with_adjust_lr_fn(self, adjust_lr_fn: AdjustLrFn) -> Self

Learning rate adjustment method.

pub fn with_weight_decay(self, weight_decay: Option<WeightDecayConfig>) -> Self

Set the default value for the field.

impl MuonConfig

pub fn init<B: AutodiffBackend, M: AutodiffModule<B>>( &self, ) -> OptimizerAdaptor<Muon<B::InnerBackend>, M, B>

Initialize Muon optimizer.

§Returns

Returns an optimizer adaptor that can be used to optimize a module.

§Example

use burn_optim::{MuonConfig, AdjustLrFn, decay::WeightDecayConfig};

// Basic configuration with default (Original) LR adjustment
let optimizer = MuonConfig::new()
    .with_weight_decay(Some(WeightDecayConfig::new(0.01)))
    .init();

// With AdamW-compatible settings using MatchRmsAdamW
let optimizer = MuonConfig::new()
    .with_adjust_lr_fn(AdjustLrFn::MatchRmsAdamW)
    .with_weight_decay(Some(WeightDecayConfig::new(0.1)))
    .init();

// Custom momentum and NS settings
let optimizer = MuonConfig::new()
    .with_momentum(MomentumConfig {
        momentum: 0.9,
        dampening: 0.1,
        nesterov: false,
    })
    .with_ns_steps(7)
    .init();

Trait Implementations§

impl Clone for MuonConfig

fn clone(&self) -> Self

Returns a duplicate of the value. Read more

1.0.0§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Config for MuonConfig

fn save<P>(&self, file: P) -> Result<(), Error>
where P: AsRef<Path>,

Available on crate feature std only.

Saves the configuration to a file. Read more

fn load<P>(file: P) -> Result<Self, ConfigError>
where P: AsRef<Path>,

Available on crate feature std only.

Loads the configuration from a file. Read more

fn load_binary(data: &[u8]) -> Result<Self, ConfigError>

Loads the configuration from a binary buffer. Read more

impl Debug for MuonConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<'de> Deserialize<'de> for MuonConfig

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

impl Display for MuonConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Serialize for MuonConfig

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

impl Freeze for MuonConfig

impl RefUnwindSafe for MuonConfig

impl Send for MuonConfig

impl Sync for MuonConfig

impl Unpin for MuonConfig

impl UnwindSafe for MuonConfig

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T> ToString for T
where T: Display + ?Sized,

fn to_string(&self) -> String

Converts the given value to a String. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,