Skip to main content

Module models

Module models 

Source
Expand description

Model-as-Code implementations.

Each module defines one model family as explicit Rust code: structs for weights + forward methods using the Backend trait and Linear trait directly. This replaces the earlier “generic ModelRunner + TransformerConfig” approach, which could not express MoE / MLA / multimodal / quantization cleanly.

Current coverage:

  • llama_family — Llama / Llama-2 / Llama-3 / Qwen2 / Qwen2.5 / Qwen3 (standard GQA + SwiGLU + RoPE, optional QK-norm).

Planned (Phase D):

  • mistral — sliding-window attention variant.
  • deepseek_v3 — MLA compressed KV + MoE expert routing.
  • qwen_vl — ViT backbone + LLM (multimodal).

Re-exports§

pub use llama_family::LlamaFamilyConfig;
pub use llama_family::LlamaFamilyModel;
pub use llama_family_pipeline::LlamaFamilyPipelineModel;
pub use llama_family_pipeline::LlamaPipelineMode;
pub use llama_family_pipeline::LlamaPipelinePlacement;
pub use llama_family_pipeline::LlamaPipelineStageBridge;
pub use llama_family_pipeline::LlamaPipelineStagePlacement;
pub use llama_family_pipeline::LlamaPipelineTransport;
pub use qwen3_moe::Qwen3MoeModel;

Modules§

llama_family
Llama-family decoder model as explicit code.
llama_family_forward_batched
Batched-decode forward methods for LlamaFamilyModel.
llama_family_pipeline
qwen3_moe
Qwen3MoeModel<B> — Qwen3-MoE family decoder (Qwen3-30B-A3B and friends).
qwen3_moe_forward_unified
Qwen3-MoE unified mixed-batch forward — vLLM-style.
qwen3_moe_forward_unified_layer
One Qwen3-MoE unified forward transformer layer.
qwen3_moe_profile
Profiling counters shared by the Qwen3-MoE forward paths.
qwen3_moe_runtime
Runtime environment snapshot for Qwen3-MoE model paths.