Skip to main content

Module elementwise

Module elementwise 

Source
Expand description

SIMD-accelerated element-wise operations.

AVX2 implementations of ReLU, vector add, and scalar multiply. These are bandwidth-bound at large sizes; SIMD helps at small-to-medium sizes by reducing instruction count and enabling wider stores.

§Algorithm

ReLU: _mm256_max_ps(x, zero) — single instruction per 8 elements Add: _mm256_add_ps(a, b) — single instruction per 8 elements Mul scalar: _mm256_mul_ps(x, scalar_vec) — single instruction per 8 elements

Contract: provable-contracts/contracts/activation-kernel-v1.yaml

Functions§

add
Element-wise add: output_i = a_i + b_i
add_alloc
Element-wise add with output allocation. Avoids zero-fill overhead.
add_inplace
In-place add: a_i += b_i
fused_add_relu
Fused add + ReLU: output_i = max(0, a_i + b_i)
fused_add_relu_inplace
In-place fused add + ReLU: a_i = max(0, a_i + b_i)
fused_mul_add
Fused multiply-add: output_i = a_i * b_i + c_i
fused_scale_bias_relu
Fused scale + bias + ReLU: output_i = max(0, input_i * scale + bias)
mul_scalar
Element-wise scalar multiply: output_i = input_i * scalar
mul_scalar_alloc
Scalar multiply with output allocation. Avoids zero-fill overhead.
relu
ReLU: output_i = max(0, input_i)
relu_alloc
ReLU with output allocation. Avoids zero-fill overhead of vec![0.0; n].
relu_inplace
In-place ReLU: data_i = max(0, data_i)
scale_inplace
In-place scale: data_i *= scalar