pub struct Qwen3Config {Show 22 fields
pub vocab_size: usize,
pub hidden_size: usize,
pub intermediate_size: usize,
pub num_hidden_layers: usize,
pub num_attention_heads: usize,
pub num_key_value_heads: usize,
pub head_dim: usize,
pub max_position_embeddings: usize,
pub rms_norm_eps: f64,
pub rope_theta: f64,
pub hidden_act: String,
pub tie_word_embeddings: bool,
pub attention_bias: bool,
pub qk_norm: bool,
pub sliding_window: Option<usize>,
pub max_window_layers: usize,
pub use_sliding_window: bool,
pub num_experts: usize,
pub num_experts_used: usize,
pub expert_ffn_size: usize,
pub shared_expert_ffn_size: usize,
pub expert_weights_scale: f32,
}Fields§
§vocab_size: usize§intermediate_size: usize§num_attention_heads: usize§num_key_value_heads: usize§head_dim: usize§max_position_embeddings: usize§rms_norm_eps: f64§rope_theta: f64§tie_word_embeddings: bool§attention_bias: bool§qk_norm: boolWhether the model uses per-head RMS-norm on Q/K before RoPE
(a.k.a. “QK-norm”). Qwen 3 has it; Qwen 2 does NOT. Defaults to
true to match the historical Qwen 3 build path.
sliding_window: Option<usize>Sliding-window size; None (or absent) means full causal.
max_window_layers: usizeNumber of leading layers that use full causal attention; layers
[max_window_layers, num_hidden_layers) use sliding window when
use_sliding_window is true. HF default: all layers full.
use_sliding_window: bool§num_experts: usizeTotal number of routed experts per MoE layer (0 = dense model).
HF key: num_experts / n_routed_experts. GGUF key:
qwen3.expert_count.
num_experts_used: usizeNumber of experts activated per token (top-k routing).
HF key: num_experts_per_tok. GGUF key: qwen3.expert_used_count.
expert_ffn_size: usizeFFN inner width for each routed expert. When 0 falls back to
intermediate_size / num_experts_used to match upstream defaults.
GGUF key: qwen3.expert_feed_forward_length.
FFN inner width for the always-on shared expert (0 = no shared
expert). GGUF key: qwen3.expert_shared_feed_forward_length.
expert_weights_scale: f32Multiplier applied to routed-expert logits before softmax
(default 1.0). GGUF key: qwen3.expert_weights_scale.
Implementations§
Source§impl Qwen3Config
impl Qwen3Config
pub fn from_file(path: &Path) -> Result<Qwen3Config, Error>
Sourcepub fn kv_group_size(&self) -> usize
pub fn kv_group_size(&self) -> usize
Repetition factor for GQA: how many Q heads share each KV head.
Sourcepub fn q_proj_dim(&self) -> usize
pub fn q_proj_dim(&self) -> usize
Q projection output width (num_attention_heads * head_dim).
Sourcepub fn kv_proj_dim(&self) -> usize
pub fn kv_proj_dim(&self) -> usize
K/V projection output width (num_key_value_heads * head_dim).
Sourcepub fn is_moe(&self) -> bool
pub fn is_moe(&self) -> bool
True when the config carries MoE routing (num_experts > 0).
qwen3-30b-a3b-instruct and qwen3-coder-next MoE variants
will return true; dense Qwen3 returns false.
Sourcepub fn expert_ffn_dim(&self) -> usize
pub fn expert_ffn_dim(&self) -> usize
Routed-expert SwiGLU inner width. Falls back to
intermediate_size / num_experts_used when the explicit
expert_ffn_size is absent (matches upstream defaults).
Shared-expert SwiGLU inner width (0 when there’s no shared expert).
Sourcepub fn layer_uses_swa(&self, idx: usize) -> bool
pub fn layer_uses_swa(&self, idx: usize) -> bool
Does layer idx use sliding-window attention?
Trait Implementations§
Source§impl Clone for Qwen3Config
impl Clone for Qwen3Config
Source§fn clone(&self) -> Qwen3Config
fn clone(&self) -> Qwen3Config
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for Qwen3Config
impl Debug for Qwen3Config
Source§impl<'de> Deserialize<'de> for Qwen3Config
impl<'de> Deserialize<'de> for Qwen3Config
Source§fn deserialize<__D>(
__deserializer: __D,
) -> Result<Qwen3Config, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(
__deserializer: __D,
) -> Result<Qwen3Config, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for Qwen3Config
impl RefUnwindSafe for Qwen3Config
impl Send for Qwen3Config
impl Sync for Qwen3Config
impl Unpin for Qwen3Config
impl UnsafeUnpin for Qwen3Config
impl UnwindSafe for Qwen3Config
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> DeserializeOwned for Twhere
T: for<'de> Deserialize<'de>,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more