Skip to main content

Module copy

Module copy 

Source
Expand description

GPU-accelerated strided copy for making tensors contiguous.

Copies a 2D strided tensor to a contiguous layout: dst[row * cols + col] = src[row * stride_row + col * stride_col]

Used after transpose/permute operations to produce contiguous memory.

Structs§

StridedCopyParams
Parameters for a strided copy operation.

Statics§

COPY_SHADER_SOURCE
MSL source for the strided copy kernel (embedded at compile time).

Functions§

dispatch_copy_f32
Copy count f32 elements from src[src_offset..] to dst[dst_offset..].
dispatch_strided_copy_f32
Dispatch a strided copy operation on the GPU.
register
Register strided copy shader source with the given kernel registry.