Skip to main content

Module modern_bert

Module modern_bert 

Source
Expand description

ModernBERT architecture (nomic-ai/modernbert-embed-base).

22-layer transformer with alternating local/global attention, gated GELU (GeGLU) MLP, two RoPE frequency caches, and pre-norm layer structure. No biases anywhere, no position embeddings (RoPE only), mean pooling.

Weight structures are generic over the tensor type T, which is Driver::Tensor when wired to a backend. The ModelArch implementation composes Driver primitives into the full forward pass.

Structs§

ModernBertArch
ModernBERT architecture: nomic-ai/modernbert-embed-base.
ModernBertLayerWeights
Weights for one ModernBERT encoder layer.
ModernBertWeights
Full ModernBERT model weights, generic over tensor type.
RopeCache
Pre-computed RoPE cos/sin cache for one frequency base.