pub struct TrainingMonitor { /* private fields */ }Expand description
Real-time training health monitor.
Attaches to any optimizer to detect pathological training behavior including gradient explosion/vanishing, loss divergence, dead neurons, and convergence. Generates alerts with severity levels and provides actionable suggestions.
Implementations§
Source§impl TrainingMonitor
impl TrainingMonitor
Sourcepub fn with_config(config: MonitorConfig) -> Self
pub fn with_config(config: MonitorConfig) -> Self
Creates a new training monitor with the given configuration.
Sourcepub fn record_step(&mut self, loss: f32, grad_norms: &[(&str, f32)], lr: f32)
pub fn record_step(&mut self, loss: f32, grad_norms: &[(&str, f32)], lr: f32)
Records a single training step.
§Arguments
loss- The loss value for this step.grad_norms- Slice of (parameter_name, gradient_norm) pairs.lr- The current learning rate.
Sourcepub fn check_health(&self) -> HealthReport
pub fn check_health(&self) -> HealthReport
Returns a full health report for the current training state.
Sourcepub fn is_healthy(&self) -> bool
pub fn is_healthy(&self) -> bool
Returns true if training appears healthy (no critical alerts, loss not diverging).
Sourcepub fn alerts(&self) -> &[TrainingAlert]
pub fn alerts(&self) -> &[TrainingAlert]
Returns the accumulated alerts.
Sourcepub fn clear_alerts(&mut self)
pub fn clear_alerts(&mut self)
Clears all accumulated alerts.
Sourcepub fn loss_trend(&self) -> LossTrend
pub fn loss_trend(&self) -> LossTrend
Analyzes the loss trajectory over the recent window.
Compares the rolling average of the most recent window to the rolling average of the previous window to classify the trend.
Sourcepub fn suggest_lr(&self) -> Option<f32>
pub fn suggest_lr(&self) -> Option<f32>
Suggests a learning rate adjustment based on current training dynamics.
Returns None if no adjustment is needed or training has converged.
Sourcepub fn grad_norm_stats(&self) -> (f32, f32, f32)
pub fn grad_norm_stats(&self) -> (f32, f32, f32)
Returns (mean, std, max) of gradient norms over the recent window.
Sourcepub fn convergence_score(&self) -> f32
pub fn convergence_score(&self) -> f32
Returns a convergence score between 0.0 and 1.0.
1.0 indicates full convergence (no loss change over the window). 0.0 indicates the loss is still actively changing.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TrainingMonitor
impl RefUnwindSafe for TrainingMonitor
impl Send for TrainingMonitor
impl Sync for TrainingMonitor
impl Unpin for TrainingMonitor
impl UnsafeUnpin for TrainingMonitor
impl UnwindSafe for TrainingMonitor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more