Skip to main content

Module simd_copy

Module simd_copy 

Source
Expand description

SIMD-accelerated copy for non-contiguous-to-contiguous coercion.

Use these routines when a Python caller passes a strided (non-contiguous) NumPy array that must be gathered into a flat, contiguous Vec before further processing. SIMD-accelerated copy for non-contiguous-to-contiguous coercion.

When Python passes a non-contiguous (strided) NumPy array, it must be gathered into a contiguous buffer before it can be used as an ndarray::ArrayView. These routines provide a fast path for that gather operation.

§Dispatch strategy

PlatformConditionImplementation
x86_64avx2 detected at runtimeAVX2 256-bit gather
x86_64no avx2 or fallback requiredscalar loop
all othersalwaysscalar loop

When stride == 1 the memory is already contiguous; [ptr::copy_nonoverlapping] is used for the fastest possible copy.

§Safety contract

Both public functions are unsafe because they operate on raw pointers. The caller must guarantee:

  • src points to a valid, aligned allocation of at least n_elements * stride * size_of::<T>() bytes.
  • dst.len() >= n_elements.
  • The source and destination ranges do not overlap.
  • stride * (n_elements.saturating_sub(1)) fits in isize (i.e., no pointer overflow on the source side).
  • All n_elements source elements are properly initialised.

Functions§

copy_strided_to_contiguous_f32
Copy n_elements strided f32 values from src into the contiguous slice dst.
copy_strided_to_contiguous_f64
Copy n_elements strided f64 values from src into the contiguous slice dst.