Skip to main content

dispatch_dequant_to_f16

Function dispatch_dequant_to_f16 

Source
pub fn dispatch_dequant_to_f16(
    encoder: &mut CommandEncoder,
    registry: &mut KernelRegistry,
    device: &DeviceRef,
    weight: &MlxBuffer,
    f16_shadow: &MlxBuffer,
    n_rows: u32,
    n_cols: u32,
    ggml_type: GgmlType,
) -> Result<()>
Expand description

Dispatch the whole-tensor dequant-to-F16 kernel.

weight is the source quantized buffer (caller-allocated, holds the GGUF-format bytes for n_rows × n_cols elements of ggml_type). f16_shadow is the destination buffer, must be at least n_rows * n_cols * 2 bytes (F16 = 2 bytes/elem).

n_rows / n_cols are the logical tensor shape; the kernel writes n_rows * n_cols F16 values into f16_shadow in row-major order (matching the row-major dequant layout the matmul kernels expect).

Returns InvalidArgument if ggml_type is unsupported (F32 / F16 / I16 — no dequant needed) or if buffer sizes don’t match.