pub fn launch_copy_perpendicular_ref<R: Runtime>(
client: &ComputeClient<R>,
input: TensorBinding<R>,
output: TensorBinding<R>,
dtype: StorageType,
)Expand description
Launches the perpendicular contiguous kernel.
This is used when the input tensor’s memory layout is such that the last dimension is not the one with a stride of 1 (the vectorized dimension). It optimizes the copy by using hardware vectorization (Vectors) and an in-register transpose.