Expand description
Transformer models for advanced pattern recognition in sparse matrices
This module contains transformer-based architectures for learning complex patterns in sparse matrix operations and optimizing them adaptively.
Structsยง
- Attention
Gradients - FFGradients
- Head
Gradients - Layer
Gradients - Transformer
Gradients - Gradient structures for transformer training