Module simd

Module simd 

Source
Expand description

§SIMD Arithmetic Kernels Module - High-Performance Arithmetic

Inner SIMD-accelerated implementations using std::simd for maximum performance on modern hardware. Prefer dispatch.rs for easily handling the general case, otherwise you can use these inner functions directly (e.g., “dense_simd”) vs. “maybe masked, maybe simd”.

§Overview

  • Portable SIMD: Uses std::simd for cross-platform vectorisation with compile-time lane optimisation
  • Null masks: Dense (no nulls) and masked variants for Arrow-compatible null handling. These are uniified in dispatch.rs, and opting out of masking yields no performance penalty.
  • Type support: Integer and floating-point arithmetic with specialised FMA operations
  • Safety: All unsafe operations are bounds-checked or guaranteed by caller invariants

§Architecture Notes

  • Building blocks for higher-level dispatch layers, or for low-level hot loops where one wants to fully avoid abstraction overhead.
  • Parallelisation intentionally excluded to allow flexible chunking strategies
  • Power operations fall back to scalar for integers, use logarithmic computation for floats

Constants§

W8
Auto-generated SIMD lane widths from build.rs SIMD lane count for 8-bit elements (u8, i8). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W16
SIMD lane count for 16-bit elements (u16, i16). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W32
SIMD lane count for 32-bit elements (u32, i32, f32). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W64
SIMD lane count for 64-bit elements (u64, i64, f64). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.

Functions§

float_dense_body_f32_simd
SIMD f32 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
float_dense_body_f64_simd
SIMD f64 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
float_masked_body_f32_simd
SIMD f32 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
float_masked_body_f64_simd
SIMD f64 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
fma_dense_body_f32_simd
SIMD f32 fused multiply-add kernel for dense arrays (no nulls). Hardware-accelerated a.mul_add(b, c) with vectorised and scalar tail processing.
fma_dense_body_f64_simd
SIMD f64 fused multiply-add kernel for dense arrays (no nulls). Hardware-accelerated a.mul_add(b, c) with vectorised and scalar tail processing.
fma_masked_body_f32_simd
SIMD f32 fused multiply-add kernel with null mask support. Hardware-accelerated a.mul_add(b, c) with proper null propagation.
fma_masked_body_f64_simd
SIMD f64 fused multiply-add kernel with null mask support. Hardware-accelerated a.mul_add(b, c) with proper null propagation.
int_dense_body_simd
SIMD integer arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Panics on division/remainder by zero (consistent with scalar behaviour).
int_masked_body_simd
SIMD integer arithmetic kernel with null mask support. Division/remainder by zero produces null results (mask=false) rather than panicking.