Expand description
SIMD Backend Dispatch Module
Provides architecture-specific SIMD implementations with automatic dispatch:
- AVX-512 for modern Intel/AMD (16 floats per iteration)
- AVX2 with FMA for Intel Haswell+ / AMD Zen+ (8 floats per iteration)
- NEON for ARM64/Apple Silicon (4 floats per iteration)
- WASM SIMD for WebAssembly (4 floats per iteration)
- Winograd F(2,3) for 2.25x faster 3x3 convolutions
- Scalar fallback for all other platforms
Re-exports§
pub use winograd::conv_3x3_winograd;pub use winograd::transform_filter;pub use winograd::transform_input;pub use winograd::transform_output;pub use winograd::WinogradFilterCache;pub use quantize::QuantParams;pub use quantize::QuantizedTensor;pub use quantize::QuantizationType;pub use quantize::PerChannelQuantParams;pub use quantize::quantize_simd;pub use quantize::dequantize_simd;pub use quantize::quantize_batch;pub use quantize::dequantize_batch;pub use quantize::pi_constants;pub use avx2::*;pub use scalar::*;
Modules§
- avx2
- AVX2 and AVX-512 SIMD Implementations
- neon
- NEON SIMD Implementations for ARM64/Apple Silicon
- quantize
- INT8 Quantization with π-Based Calibration
- scalar
- Scalar Fallback Implementations
- winograd
- Winograd F(2,3) Convolution Implementation
Functions§
- batch_
norm_ simd - SIMD-accelerated batch normalization with automatic architecture dispatch
- conv_
3x3_ simd - SIMD-accelerated 3x3 convolution with automatic architecture dispatch
- depthwise_
conv_ 3x3_ simd - SIMD-accelerated depthwise 3x3 convolution
- dot_
product_ simd - SIMD-accelerated dot product with automatic architecture dispatch
- global_
avg_ pool_ simd - SIMD-accelerated global average pooling
- max_
pool_ 2x2_ simd - SIMD-accelerated max pooling 2x2
- relu6_
simd - SIMD-accelerated ReLU6 activation with automatic architecture dispatch
- relu_
simd - SIMD-accelerated ReLU activation with automatic architecture dispatch