Module simd_copy

Expand description

SIMD-accelerated copy for non-contiguous-to-contiguous coercion.

Use these routines when a Python caller passes a strided (non-contiguous) NumPy array that must be gathered into a flat, contiguous Vec before further processing. SIMD-accelerated copy for non-contiguous-to-contiguous coercion.

When Python passes a non-contiguous (strided) NumPy array, it must be gathered into a contiguous buffer before it can be used as an ndarray::ArrayView. These routines provide a fast path for that gather operation.

§Dispatch strategy

Platform	Condition	Implementation
x86_64	`avx2` detected at runtime	AVX2 256-bit gather
x86_64	no avx2 or fallback required	scalar loop
all others	always	scalar loop

When stride == 1 the memory is already contiguous; [ptr::copy_nonoverlapping] is used for the fastest possible copy.

§Safety contract

Both public functions are unsafe because they operate on raw pointers. The caller must guarantee:

src points to a valid, aligned allocation of at least n_elements * stride * size_of::<T>() bytes.
dst.len() >= n_elements.
The source and destination ranges do not overlap.
stride * (n_elements.saturating_sub(1)) fits in isize (i.e., no pointer overflow on the source side).
All n_elements source elements are properly initialised.

Functions§

copy_strided_to_contiguous_f32^⚠: Copy n_elements strided f32 values from src into the contiguous slice dst.
copy_strided_to_contiguous_f64^⚠: Copy n_elements strided f64 values from src into the contiguous slice dst.

Module simd_copy

Module simd_copy Copy item path

§Dispatch strategy

§Safety contract

Functions§

Module simd_copy