pub fn dispatch_strided_copy_f32(
encoder: &mut CommandEncoder,
registry: &mut KernelRegistry,
device: &DeviceRef,
src: &MlxBuffer,
dst: &MlxBuffer,
params: &StridedCopyParams,
) -> Result<()>Expand description
Dispatch a strided copy operation on the GPU.
Copies a 2D strided tensor to contiguous layout:
dst[row * cols + col] = src[row * stride_row + col * stride_col]
§Arguments
encoder- Command encoder to record the dispatch into.registry- Kernel registry (must havestrided_copy_f32registered).device- Metal device for pipeline compilation.src- Source buffer (f32, strided layout).dst- Destination buffer (f32, contiguous output).params- Copy parameters (rows, cols, strides).
§Errors
Returns MlxError::InvalidArgument if dimensions are 0 or buffers are
too small.