Expand description
Pre-trained model architectures (Qwen2, etc.) Pre-trained model architectures for inference.
This module provides ready-to-use model implementations that combine
the primitives from nn into complete architectures.
§Available Models
Qwen2Model- Qwen2-0.5B-Instruct decoder-only transformer
§Design Philosophy
Models follow the “assembly pattern” - they compose existing primitives (attention, normalization, feedforward) rather than duplicating code.
┌─────────────────────────────────────────────────────────────────┐
│ Model Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────────────┐ ┌────────────┐ │
│ │ Embedding │ -> │ N × DecoderLayer │ -> │ LM Head │ │
│ │ (vocab→d) │ │ (GQA + FFN + Norm) │ │ (d→vocab) │ │
│ └─────────────┘ └─────────────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘§References
- Bai et al. (2023). “Qwen Technical Report”
- Vaswani et al. (2017). “Attention Is All You Need”
Re-exports§
pub use qwen2::Qwen2Model;
Modules§
- qwen2
- Qwen2-0.5B-Instruct Model Implementation