Skip to main content

Crate axonml_optim

Crate axonml_optim 

Source
Expand description

axonml-optim - Optimization Algorithms

Provides optimizers for training neural networks with comprehensive support for modern training techniques.

§Optimizers

  • SGD - Stochastic Gradient Descent with momentum and Nesterov acceleration
  • Adam - Adaptive Moment Estimation
  • AdamW - Adam with decoupled weight decay
  • RMSprop - Root Mean Square Propagation
  • LAMB - Layer-wise Adaptive Moments for large batch training (BERT-scale)

§Learning Rate Schedulers

  • StepLR - Step decay at fixed intervals
  • MultiStepLR - Decay at specified milestones
  • ExponentialLR - Exponential decay
  • CosineAnnealingLR - Cosine annealing
  • OneCycleLR - 1cycle policy (super-convergence)
  • WarmupLR - Linear warmup
  • ReduceLROnPlateau - Reduce on metric plateau

§Mixed Precision Support

  • GradScaler - Gradient scaling for F16 training to prevent underflow

§Basic Example

use axonml_optim::prelude::*;
use axonml_nn::{Linear, Module, Sequential};

// Create model
let model = Sequential::new()
    .add(Linear::new(784, 128))
    .add(Linear::new(128, 10));

// Create optimizer
let mut optimizer = Adam::new(model.parameters(), 0.001);

// Training loop
for epoch in 0..100 {
    let output = model.forward(&input);
    let loss = compute_loss(&output, &target);

    optimizer.zero_grad();
    loss.backward();
    optimizer.step();
}

§Mixed Precision Training with GradScaler

use axonml_optim::{Adam, GradScaler};

let mut optimizer = Adam::new(params, 0.001);
let mut scaler = GradScaler::new();

for batch in dataloader {
    // Forward pass (with autocast in F16)
    let loss = model.forward(&batch);

    // Scale loss for backward
    let scaled_loss = scaler.scale_loss(loss);

    // Backward
    optimizer.zero_grad();
    scaled_loss.backward();

    // Unscale gradients and check for inf/nan
    if scaler.unscale_grads(&mut grads) {
        optimizer.step();
    }

    // Update scale factor
    scaler.update();
}

§LAMB for Large Batch Training

use axonml_optim::LAMB;

// LAMB enables training with very large batches (32K+)
let optimizer = LAMB::new(params, 0.001)
    .betas(0.9, 0.999)
    .weight_decay(0.01);

@version 0.2.6 @author AutomataNexus Development Team

Re-exports§

pub use adam::Adam;
pub use adam::AdamW;
pub use grad_scaler::GradScaler;
pub use grad_scaler::GradScalerState;
pub use lamb::LAMB;
pub use lr_scheduler::CosineAnnealingLR;
pub use lr_scheduler::ExponentialLR;
pub use lr_scheduler::LRScheduler;
pub use lr_scheduler::MultiStepLR;
pub use lr_scheduler::OneCycleLR;
pub use lr_scheduler::ReduceLROnPlateau;
pub use lr_scheduler::StepLR;
pub use lr_scheduler::WarmupLR;
pub use optimizer::Optimizer;
pub use rmsprop::RMSprop;
pub use sgd::SGD;

Modules§

adam
Adam Optimizer - Adaptive Moment Estimation
grad_scaler
Gradient Scaler for Mixed Precision Training
lamb
LAMB Optimizer - Layer-wise Adaptive Moments
lr_scheduler
Learning Rate Schedulers
optimizer
Optimizer Trait - Core Optimizer Interface
prelude
Common imports for optimization.
rmsprop
RMSprop Optimizer
sgd
SGD Optimizer - Stochastic Gradient Descent