Expand description
Common model components shared across text model architectures.
Re-exports§
pub use chatml_history::*;
Modules§
Structs§
- Cache
- Abstraction over cosine and sine tables, kv-caching and attention masking.
- Causal
Self Attention - Config
- Generalized LLM configuration shared by all decoder-only text models.
- Linear
Attn Config - Configuration for linear (recurrent) attention layers (e.g. Gated DeltaNet in Qwen3.5).
- MLP
- Multi-perceptron implementation with fused gate+up projection.
- Rope
Scaling - RoPE scaling configuration for models with extended context (e.g. LLaMA 3.1+).
- Transformer
- Transformer block with causal self attention and several caching strategies.
Enums§
- EosToken
Id - EOS token ID(s) — deserializes from either a single u32 or an array of u32.
Functions§
- detect_
text_ model_ arch - Auto-detect text model architecture from config.json’s “architectures” field.
- load_
rms_ norm - Load an RMS norm, optionally applying the residual weight pattern
(1 + weight). Whenresidualis true (Qwen3.5), the stored weight is treated as a residual and 1.0 is added.