Module common

Expand description

Common model components shared across text model architectures.

Re-exports§

Cache: Abstraction over cosine and sine tables, kv-caching and attention masking.
CausalSelfAttention
Config: Generalized LLM configuration shared by all decoder-only text models.
LinearAttnConfig: Configuration for linear (recurrent) attention layers (e.g. Gated DeltaNet in Qwen3.5).
MLP: Multi-perceptron implementation with fused gate+up projection.
RopeScaling: RoPE scaling configuration for models with extended context (e.g. LLaMA 3.1+).
Transformer: Transformer block with causal self attention and several caching strategies.

EosTokenId: EOS token ID(s) — deserializes from either a single u32 or an array of u32.

detect_text_model_arch: Auto-detect text model architecture from config.json’s “architectures” field.
load_rms_norm: Load an RMS norm, optionally applying the residual weight pattern (1 + weight). When residual is true (Qwen3.5), the stored weight is treated as a residual and 1.0 is added.