Module models

Expand description

Pre-trained model architectures (Qwen2, etc.) Pre-trained model architectures for inference.

This module provides ready-to-use model implementations that combine the primitives from nn into complete architectures.

§Available Models

Qwen2Model - Qwen2-0.5B-Instruct decoder-only transformer

§Design Philosophy

Models follow the “assembly pattern” - they compose existing primitives (attention, normalization, feedforward) rather than duplicating code.

┌─────────────────────────────────────────────────────────────────┐
│                     Model Architecture                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌─────────────┐    ┌─────────────────────┐    ┌────────────┐  │
│   │ Embedding   │ -> │  N × DecoderLayer   │ -> │ LM Head    │  │
│   │ (vocab→d)   │    │  (GQA + FFN + Norm) │    │ (d→vocab)  │  │
│   └─────────────┘    └─────────────────────┘    └────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

§References

Bai et al. (2023). “Qwen Technical Report”
Vaswani et al. (2017). “Attention Is All You Need”

Re-exports§

pub use qwen2::Qwen2Model;

Modules§

qwen2: Qwen2-0.5B-Instruct Model Implementation

Module models

Module models Copy item path

§Available Models

§Design Philosophy

§References

Re-exports§

Modules§

Module models