Module simd

Module simd 

Source

Re-exports§

pub use super::simd_conv::conv1d_simd;

Functions§

dot_simd
Dot product SIMD implementation with FMA (fused multiply-add)
elementwise_simd
Prototype SIMD path that currently delegates to scalar implementation.
elementwise_simd_supported
Placeholder for SIMD CPU execution strategy. In a full implementation this would contain vectorized loops and checks for AVX/NEON availability.
matmul_simd
SIMD-accelerated matrix multiplication Uses blocked tiled algorithm with SIMD vectorization for inner loops
reduce_simd
SIMD-accelerated reduction (sum, max, min, mean). For the full-sum case (axis None) we implement an AVX2 vectorized loop that accumulates into an __m256 register and then horizontally reduces it. For other architectures / when AVX2 absent we fall back to scalar.