Expand description
SIMD-accelerated copy for non-contiguous-to-contiguous coercion.
Use these routines when a Python caller passes a strided (non-contiguous)
NumPy array that must be gathered into a flat, contiguous Vec before
further processing.
SIMD-accelerated copy for non-contiguous-to-contiguous coercion.
When Python passes a non-contiguous (strided) NumPy array, it must be gathered
into a contiguous buffer before it can be used as an ndarray::ArrayView.
These routines provide a fast path for that gather operation.
§Dispatch strategy
| Platform | Condition | Implementation |
|---|---|---|
| x86_64 | avx2 detected at runtime | AVX2 256-bit gather |
| x86_64 | no avx2 or fallback required | scalar loop |
| all others | always | scalar loop |
When stride == 1 the memory is already contiguous; [ptr::copy_nonoverlapping]
is used for the fastest possible copy.
§Safety contract
Both public functions are unsafe because they operate on raw pointers. The
caller must guarantee:
srcpoints to a valid, aligned allocation of at leastn_elements * stride * size_of::<T>()bytes.dst.len() >= n_elements.- The source and destination ranges do not overlap.
stride * (n_elements.saturating_sub(1))fits inisize(i.e., no pointer overflow on the source side).- All
n_elementssource elements are properly initialised.
Functions§
- copy_
strided_ ⚠to_ contiguous_ f32 - Copy
n_elementsstridedf32values fromsrcinto the contiguous slicedst. - copy_
strided_ ⚠to_ contiguous_ f64 - Copy
n_elementsstridedf64values fromsrcinto the contiguous slicedst.