Skip to main content

Module simd

Module simd 

Source
Expand description

SIMD Backend Dispatch Module

Provides architecture-specific SIMD implementations with automatic dispatch:

  • AVX-512 for modern Intel/AMD (16 floats per iteration)
  • AVX2 with FMA for Intel Haswell+ / AMD Zen+ (8 floats per iteration)
  • NEON for ARM64/Apple Silicon (4 floats per iteration)
  • WASM SIMD for WebAssembly (4 floats per iteration)
  • Winograd F(2,3) for 2.25x faster 3x3 convolutions
  • Scalar fallback for all other platforms

Re-exports§

pub use winograd::conv_3x3_winograd;
pub use winograd::transform_filter;
pub use winograd::transform_input;
pub use winograd::transform_output;
pub use winograd::WinogradFilterCache;
pub use quantize::QuantParams;
pub use quantize::QuantizedTensor;
pub use quantize::QuantizationType;
pub use quantize::PerChannelQuantParams;
pub use quantize::quantize_simd;
pub use quantize::dequantize_simd;
pub use quantize::quantize_batch;
pub use quantize::dequantize_batch;
pub use quantize::pi_constants;
pub use avx2::*;
pub use scalar::*;

Modules§

avx2
AVX2 and AVX-512 SIMD Implementations
neon
NEON SIMD Implementations for ARM64/Apple Silicon
quantize
INT8 Quantization with π-Based Calibration
scalar
Scalar Fallback Implementations
winograd
Winograd F(2,3) Convolution Implementation

Functions§

batch_norm_simd
SIMD-accelerated batch normalization with automatic architecture dispatch
conv_3x3_simd
SIMD-accelerated 3x3 convolution with automatic architecture dispatch
depthwise_conv_3x3_simd
SIMD-accelerated depthwise 3x3 convolution
dot_product_simd
SIMD-accelerated dot product with automatic architecture dispatch
global_avg_pool_simd
SIMD-accelerated global average pooling
max_pool_2x2_simd
SIMD-accelerated max pooling 2x2
relu6_simd
SIMD-accelerated ReLU6 activation with automatic architecture dispatch
relu_simd
SIMD-accelerated ReLU activation with automatic architecture dispatch