Expand description
ModernBERT architecture (nomic-ai/modernbert-embed-base).
22-layer transformer with alternating local/global attention, gated GELU
(GeGLU) MLP, two RoPE frequency caches, and pre-norm layer structure.
No biases anywhere, no position embeddings (RoPE only), mean pooling.
Weight structures are generic over the tensor type T, which is
Driver::Tensor when wired to a
backend. The ModelArch implementation composes
Driver primitives into the full forward
pass.
Structs§
- Modern
Bert Arch ModernBERTarchitecture:nomic-ai/modernbert-embed-base.- Modern
Bert Layer Weights - Weights for one
ModernBERTencoder layer. - Modern
Bert Weights - Full
ModernBERTmodel weights, generic over tensor type. - Rope
Cache - Pre-computed
RoPEcos/sin cache for one frequency base.