pub fn tensor_vector_size_parallel(
optimized_vector_sizes: impl Iterator<Item = usize>,
shape: &Shape,
strides: &Strides,
axis: usize,
) -> usizeExpand description
Find the maximum vector size usable for parallel vectorization along the given axis from the supported vector sizes or return 1 if vectorization is impossible.
This function is designed to never return a vector size above 1 by error, but doesn’t guarantee to always return the actual maximum possible vector size. That is, it may be overly strict.
Currently, this checks that the stride of the axis is 1, that its shape is
divisible by a candidate vector size and that every non-broadcast stride outside
the axis is divisible by the vector size.
The last condition ensures a vectorized read on axis stays contiguous in the
source buffer as coordinates in other dimensions change.