Expand description
Core SIMD-accelerated operations: dot product, matrix-vector multiply, and activation functions.
These are the hottest primitives across SSM, ESN, and attention/neural
forward passes. AVX2 processes 4 f64 values per cycle, giving up to
~4x throughput on aligned inner loops.
§Architecture
Public API (safe) Internal dispatch
───────────────── ─────────────────
simd_dot(a, b) ──► avx2::dot_avx2 (x86_64 + AVX2 detected)
└──► dot_scalar (fallback)
simd_mat_vec(w,x,..) ──► avx2::mat_vec_avx2 (x86_64 + AVX2 detected)
└──► mat_vec_scalar (fallback)
simd_tanh(in, out) ──► avx2::tanh_avx2 (x86_64 + AVX2, Padé [2,2])
└──► tanh_scalar (fallback)
simd_exp(in, out) ──► avx2::exp_avx2 (x86_64 + AVX2, range-reduced deg-5)
└──► exp_scalar (fallback)
simd_sigmoid(in, out) ──► avx2::sigmoid_avx2 (x86_64 + AVX2, via exp)
└──► sigmoid_scalar (fallback)
simd_silu(in, out) ──► avx2::silu_avx2 (x86_64 + AVX2, via sigmoid)
└──► silu_scalar (fallback)Functions§
- simd_
dot - SIMD-accelerated dot product with runtime feature detection.
- simd_
exp - SIMD-accelerated element-wise exp with runtime feature detection.
- simd_
mat_ vec - SIMD-accelerated matrix-vector multiply with runtime feature detection.
- simd_
sigmoid - SIMD-accelerated element-wise sigmoid with runtime feature detection.
- simd_
silu - SIMD-accelerated element-wise SiLU (Sigmoid Linear Unit) with runtime feature detection.
- simd_
tanh - SIMD-accelerated element-wise tanh with runtime feature detection.