pub fn moe_accumulate_encode_offset(
encoder: &mut CommandEncoder,
registry: &mut KernelRegistry,
device: &DeviceRef,
accumulator: &MlxBuffer,
expert_output: &MlxBuffer,
src_byte_offset: usize,
routing_weight: f32,
n_elements: usize,
) -> Result<()>Expand description
Like moe_accumulate_encode but reads expert_output from src_byte_offset.
This enables reading from a slice within a larger buffer (e.g. the down _id kernel output which contains top_k rows of hidden data).