Skip to main content

Module info_geometry

Module info_geometry 

Source
Expand description

Information Geometry for Attention

Natural gradient methods using Fisher information metric.

§Key Concepts

  1. Fisher Metric: F = diag(p) - p*p^T on probability simplex
  2. Natural Gradient: Solve Fdelta = grad, then update params -= lrdelta
  3. Conjugate Gradient: Efficient solver for Fisher system

§Use Cases

  • Training attention weights with proper geometry
  • Routing probabilities in MoE
  • Softmax logit optimization

Structs§

FisherConfig
Fisher metric configuration
FisherMetric
Fisher metric operations
NaturalGradient
Natural gradient optimizer
NaturalGradientConfig
Natural gradient configuration