pub fn launch_into_contiguous_perpendicular<R: Runtime>(
client: &ComputeClient<R>,
input: &TensorHandleRef<'_, R>,
dtype: StorageType,
) -> Result<TensorHandle<R>, LaunchError>Expand description
Launches the perpendicular contiguous kernel.
This is used when the input tensor’s memory layout is such that the last dimension is not the one with a stride of 1 (the vectorized dimension). It optimizes the copy by using hardware vectorization (Lines) and an in-register transpose.