Module models

Expand description

Model-as-Code implementations.

Each module defines one model family as explicit Rust code: structs for weights + forward methods using the Backend trait and Linear trait directly. This replaces the earlier “generic ModelRunner + TransformerConfig” approach, which could not express MoE / MLA / multimodal / quantization cleanly.

Current coverage:

llama_family — Llama / Llama-2 / Llama-3 / Qwen2 / Qwen2.5 / Qwen3 (standard GQA + SwiGLU + RoPE, optional QK-norm).

Planned (Phase D):

mistral — sliding-window attention variant.
deepseek_v3 — MLA compressed KV + MoE expert routing.
qwen_vl — ViT backbone + LLM (multimodal).

Re-exports§

pub use llama_family::LlamaFamilyConfig;
pub use llama_family::LlamaFamilyModel;
pub use llama_family_pipeline::LlamaFamilyPipelineModel;
pub use llama_family_pipeline::LlamaPipelineMode;
pub use llama_family_pipeline::LlamaPipelinePlacement;
pub use llama_family_pipeline::LlamaPipelineStageBridge;
pub use llama_family_pipeline::LlamaPipelineStagePlacement;
pub use llama_family_pipeline::LlamaPipelineTransport;
pub use qwen3_moe::Qwen3MoeModel;

Modules§

llama_family: Llama-family decoder model as explicit code.
llama_family_forward_batched: Batched-decode forward methods for LlamaFamilyModel.
llama_family_pipeline
qwen3_moe: Qwen3MoeModel<B> — Qwen3-MoE family decoder (Qwen3-30B-A3B and friends).
qwen3_moe_forward_unified: Qwen3-MoE unified mixed-batch forward — vLLM-style.
qwen3_moe_forward_unified_layer: One Qwen3-MoE unified forward transformer layer.
qwen3_moe_profile: Profiling counters shared by the Qwen3-MoE forward paths.
qwen3_moe_runtime: Runtime environment snapshot for Qwen3-MoE model paths.

Module models

Module models Copy item path

Re-exports§

Modules§

Module models