Expand description
Core SIMD-accelerated operations: dot product and matrix-vector multiply.
These are the two hottest primitives across SSM, ESN, and attention forward
passes. AVX2 processes 4 f64 values per cycle, giving up to ~4x throughput
on aligned inner loops.
§Architecture
Public API (safe) Internal dispatch
───────────────── ─────────────────
simd_dot(a, b) ──► avx2::dot_avx2 (x86_64 + AVX2 detected)
└──► dot_scalar (fallback)
simd_mat_vec(w,x,..) ──► avx2::mat_vec_avx2 (x86_64 + AVX2 detected)
└──► mat_vec_scalar (fallback)Functions§
- simd_
dot - SIMD-accelerated dot product with runtime feature detection.
- simd_
mat_ vec - SIMD-accelerated matrix-vector multiply with runtime feature detection.