Skip to main content

tensor_vector_size_parallel

Function tensor_vector_size_parallel 

Source
pub fn tensor_vector_size_parallel(
    optimized_vector_sizes: impl Iterator<Item = usize>,
    shape: &Shape,
    strides: &Strides,
    axis: usize,
) -> usize
Expand description

Find the maximum vector size usable for parallel vectorization along the given axis from the supported vector sizes or return 1 if vectorization is impossible.

This function is designed to never return a vector size above 1 by error, but doesn’t guarantee to always return the actual maximum possible vector size. That is, it may be overly strict.

Currently, this checks that the stride of the axis is 1, that its shape is divisible by a candidate vector size and that every non-broadcast stride outside the axis is divisible by the vector size. The last condition ensures a vectorized read on axis stays contiguous in the source buffer as coordinates in other dimensions change.