Skip to main content

Module runner_weights

Module runner_weights 

Source
Expand description

Generic CUDA decode runner weight loader.

Loads transformer weights from safetensors, fuses separate Q/K/V → QKV and gate/up → gate_up, then uploads to a CUDA stream. Architecture-agnostic: works for Llama, Qwen2, Mistral, and any model with the standard naming.