Skip to main content

Module tensor_simd

Module tensor_simd 

Source
Expand description

SIMD acceleration for tensor operations (AVX2, 4-wide f64).

Provides AVX2-accelerated kernels for:

  • Element-wise binary operations (add, sub, mul, div)
  • Element-wise unary operations (relu, abs, neg, sqrt)
  • Inner loop of tiled matrix multiplication (axpy: c += a * b)

§Determinism

All SIMD paths produce bit-identical results to scalar paths because:

  • IEEE 754 mandates identical rounding for scalar and SIMD add/sub/mul/div/sqrt.
  • No FMA instructions are used (_mm256_fmadd_pd changes rounding vs separate mul+add — we explicitly avoid it).
  • Element-wise ops are independent — no cross-lane reductions.
  • Tiled matmul SIMD processes multiple j-columns simultaneously but each C[i,j] accumulates the same values in the same order.

§Fallback

On non-x86_64 platforms or CPUs without AVX2, all functions fall back to scalar implementations that produce identical results.

Enums§

BinOp
Dispatch tag for SIMD-able binary operations.
UnaryOp
Dispatch tag for SIMD-able unary operations.

Functions§

has_avx2
Runtime check for AVX2 support.
simd_axpy
SIMD-accelerated AXPY: c[0..len] += scalar * b[0..len].
simd_binop
SIMD-accelerated element-wise binary operation on equal-length slices.
simd_unary
SIMD-accelerated element-wise unary operation.