Skip to main content

Module tensor_simd

Module tensor_simd

Expand description

SIMD acceleration for tensor operations (AVX2, 4-wide f64).

Provides AVX2-accelerated kernels for:

Element-wise binary operations (add, sub, mul, div)
Element-wise unary operations (relu, abs, neg, sqrt)
Inner loop of tiled matrix multiplication (axpy: c += a * b)

§Determinism

All SIMD paths produce bit-identical results to scalar paths because:

IEEE 754 mandates identical rounding for scalar and SIMD add/sub/mul/div/sqrt.
No FMA instructions are used (_mm256_fmadd_pd changes rounding vs separate mul+add — we explicitly avoid it).
Element-wise ops are independent — no cross-lane reductions.
Tiled matmul SIMD processes multiple j-columns simultaneously but each C[i,j] accumulates the same values in the same order.

§Fallback

On non-x86_64 platforms or CPUs without AVX2, all functions fall back to scalar implementations that produce identical results.

Enums§

BinOp: Dispatch tag for SIMD-able binary operations.
UnaryOp: Dispatch tag for SIMD-able unary operations.

Functions§

has_avx2: Runtime check for AVX2 support.
simd_axpy: SIMD-accelerated AXPY: c[0..len] += scalar * b[0..len].
simd_binop: SIMD-accelerated element-wise binary operation on equal-length slices.
simd_unary: SIMD-accelerated element-wise unary operation.