Expand description
GPU-accelerated strided copy for making tensors contiguous.
Copies a 2D strided tensor to a contiguous layout:
dst[row * cols + col] = src[row * stride_row + col * stride_col]
Used after transpose/permute operations to produce contiguous memory.
Structs§
- Strided
Copy Params - Parameters for a strided copy operation.
Statics§
- COPY_
SHADER_ SOURCE - MSL source for the strided copy kernel (embedded at compile time).
Functions§
- dispatch_
copy_ f32 - Copy
countf32 elements fromsrc[src_offset..]todst[dst_offset..]. - dispatch_
strided_ copy_ f32 - Dispatch a strided copy operation on the GPU.
- register
- Register strided copy shader source with the given kernel registry.