Skip to main content

Module cpu_marlin_stack

Module cpu_marlin_stack 

Source
Expand description

MarlinExpertStack<CpuBackend> impl on top of CPU’s dequant-on-load GptqStore. Facade — delegates to the existing BackendQuantMarlin::moe_gemm_phase_* (default trait impl that loops calling gemm_gptq_with_offset_strided on CPU) and make_stacked_expert_linear methods.

CPU has no real Marlin tiles; the impl exists so the bucketed MoE path’s parity test (tests/moe_bucketed_parity_test.rs) still compiles after the Phase C trait-object migration. Phase C step 4 inlines the kernel calls here and deletes the trait methods.

Structs§

CpuMarlinExpertStack