Module modern_bert

Expand description

ModernBERT architecture (nomic-ai/modernbert-embed-base).

22-layer transformer with alternating local/global attention, gated GELU (GeGLU) MLP, two RoPE frequency caches, and pre-norm layer structure. No biases anywhere, no position embeddings (RoPE only), mean pooling.

Weight structures are generic over the tensor type T, which is Driver::Tensor when wired to a backend. The ModelArch implementation composes Driver primitives into the full forward pass.

Structs§

ModernBertArch: ModernBERT architecture: nomic-ai/modernbert-embed-base.
ModernBertLayerWeights: Weights for one ModernBERT encoder layer.
ModernBertWeights: Full ModernBERT model weights, generic over tensor type.
RopeCache: Pre-computed RoPE cos/sin cache for one frequency base.

Module modern_bert

Module modern_bert Copy item path

Structs§

Module modern_bert