pub fn dispatch_dequant_to_f16(
encoder: &mut CommandEncoder,
registry: &mut KernelRegistry,
device: &DeviceRef,
weight: &MlxBuffer,
f16_shadow: &MlxBuffer,
n_rows: u32,
n_cols: u32,
ggml_type: GgmlType,
) -> Result<()>Expand description
Dispatch the whole-tensor dequant-to-F16 kernel.
weight is the source quantized buffer (caller-allocated, holds the
GGUF-format bytes for n_rows × n_cols elements of ggml_type).
f16_shadow is the destination buffer, must be at least
n_rows * n_cols * 2 bytes (F16 = 2 bytes/elem).
n_rows / n_cols are the logical tensor shape; the kernel writes
n_rows * n_cols F16 values into f16_shadow in row-major order
(matching the row-major dequant layout the matmul kernels expect).
Returns InvalidArgument if ggml_type is unsupported (F32 / F16 /
I16 — no dequant needed) or if buffer sizes don’t match.