Crate axonml_optim

Expand description

Optimization algorithms for AxonML neural network training.

Optimizer trait with step, zero_grad, get_lr, set_lr. Implementations: SGD (momentum, Nesterov), Adam, AdamW (decoupled weight decay), RMSprop, LAMB (layer-wise adaptive moments for large batch). GradScaler for AMP loss scaling. Seven LR schedulers (StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, OneCycleLR, WarmupLR, ReduceLROnPlateau). Training Health Monitor (health module) for real-time NaN/gradient-explosion/vanishing detection, loss trend analysis, dead neuron tracking, convergence scoring, and automatic learning rate suggestions.

§File

crates/axonml-optim/src/lib.rs

§Author

Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060

§Updated

April 14, 2026 11:15 PM EST

§Disclaimer

Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.

Re-exports§

pub use adam::Adam;
pub use adam::AdamW;
pub use grad_scaler::GradScaler;
pub use grad_scaler::GradScalerState;
pub use health::AlertKind;
pub use health::AlertSeverity;
pub use health::HealthReport;
pub use health::LossTrend;
pub use health::MonitorConfig;
pub use health::TrainingAlert;
pub use health::TrainingMonitor;
pub use lamb::LAMB;
pub use lr_scheduler::CosineAnnealingLR;
pub use lr_scheduler::ExponentialLR;
pub use lr_scheduler::LRScheduler;
pub use lr_scheduler::MultiStepLR;
pub use lr_scheduler::OneCycleLR;
pub use lr_scheduler::ReduceLROnPlateau;
pub use lr_scheduler::StepLR;
pub use lr_scheduler::WarmupLR;
pub use optimizer::Optimizer;
pub use rmsprop::RMSprop;
pub use sgd::SGD;

Modules§

adam: Adam and AdamW — adaptive moment estimation optimizers.
grad_scaler: GradScaler — dynamic loss scaling for AMP (mixed-precision) training.
health: Training Health Monitor — a novel AxonML feature for real-time diagnostics.
lamb: LAMB — Layer-wise Adaptive Moments for large-batch training.
lr_scheduler: Learning rate schedulers — seven strategies for LR annealing.
optimizer: Optimizer trait — the core interface for all gradient-based optimizers.
prelude: Common imports for optimization.
rmsprop: RMSprop — root mean square propagation optimizer.
sgd: SGD — Stochastic Gradient Descent with optional momentum and Nesterov.