1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
//! Mixture-of-Experts (MoE) configuration types.
//!
//! Phase 2A: data types only. The runtime (router + expert dispatch) lands
//! in Phase 2B/2C/2D. This file's job is to give the project a single
//! unambiguous representation of MoE hyperparameters so subsequent PRs
//! can wire it into the model code, the loader, and the benchmark suite
//! without each making up their own shape.
//!
//! ## Design choice: composition, not inheritance
//!
//! [`Qwen3MoeConfig`] **wraps** [`LlamaFamilyConfig`] rather than adding
//! `Option<MoeConfig>` fields to it. Reasons:
//!
//! 1. Every existing dense call site (Qwen3 / Llama / Mistral / TinyLlama
//! via `*_from_def`) keeps working unchanged — no `..Default::default()`
//! breakage, no "MoE field always present even for dense" awkwardness.
//! 2. The MoE forward path is structurally different (per-token router,
//! per-token expert subset, weighted sum) — it'll live in a separate
//! `Qwen3MoeModel<B>` rather than branching inside `LlamaFamilyModel`.
//! Sharing a config type would force the two models to coevolve.
//! 3. `Qwen3MoeConfig::base` reuses every field that genuinely is the
//! same (hidden_size, num_layers, attention dims, RoPE, vocab) so we
//! aren't duplicating dense fields.
//!
//! Trade-off: callers that just want "either dense or MoE config" will
//! need an `enum`. We'll add that wrapper if/when it earns its keep.
use crateLlamaFamilyConfig;
/// Configuration for Qwen3-MoE family models (Qwen3-30B-A3B and friends).
///
/// All MoE-specific hyperparameters live here; dense fields are inherited
/// via [`Qwen3MoeConfig::base`]. The `base.intermediate_size` is set to
/// [`Self::expert_intermediate_size`] for compatibility — Qwen3-MoE has
/// no shared dense FFN, every layer is MoE.