Module simd

Expand description

§SIMD Arithmetic Kernels Module - High-Performance Arithmetic

Inner SIMD-accelerated implementations using std::simd for maximum performance on modern hardware. Prefer dispatch.rs for easily handling the general case, otherwise you can use these inner functions directly (e.g., “dense_simd”) vs. “maybe masked, maybe simd”.

§Overview

Portable SIMD: Uses std::simd for cross-platform vectorisation with compile-time lane optimisation
Null masks: Dense (no nulls) and masked variants for Arrow-compatible null handling. These are uniified in dispatch.rs, and opting out of masking yields no performance penalty.
Type support: Integer and floating-point arithmetic with specialised FMA operations
Safety: All unsafe operations are bounds-checked or guaranteed by caller invariants

§Architecture Notes

Building blocks for higher-level dispatch layers, or for low-level hot loops where one wants to fully avoid abstraction overhead.
Parallelisation intentionally excluded to allow flexible chunking strategies
Power operations fall back to scalar for integers, use logarithmic computation for floats

Constants§

W8: Auto-generated SIMD lane widths from build.rs SIMD lane count for 8-bit elements (u8, i8). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W16: SIMD lane count for 16-bit elements (u16, i16). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W32: SIMD lane count for 32-bit elements (u32, i32, f32). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.
W64: SIMD lane count for 64-bit elements (u64, i64, f64). Determined at build time based on target architecture capabilities, or overridden via SIMD_LANES_OVERRIDE.

Functions§

float_dense_body_f32_simd: SIMD f32 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
float_dense_body_f64_simd: SIMD f64 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
float_masked_body_f32_simd: SIMD f32 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
float_masked_body_f64_simd: SIMD f64 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
fma_dense_body_f32_simd: SIMD f32 fused multiply-add kernel for dense arrays (no nulls). Hardware-accelerated a.mul_add(b, c) with vectorised and scalar tail processing.
fma_dense_body_f64_simd: SIMD f64 fused multiply-add kernel for dense arrays (no nulls). Hardware-accelerated a.mul_add(b, c) with vectorised and scalar tail processing.
fma_masked_body_f32_simd: SIMD f32 fused multiply-add kernel with null mask support. Hardware-accelerated a.mul_add(b, c) with proper null propagation.
fma_masked_body_f64_simd: SIMD f64 fused multiply-add kernel with null mask support. Hardware-accelerated a.mul_add(b, c) with proper null propagation.
int_dense_body_simd: SIMD integer arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Panics on division/remainder by zero (consistent with scalar behaviour).
int_masked_body_simd: SIMD integer arithmetic kernel with null mask support. Division/remainder by zero produces null results (mask=false) rather than panicking.

Module simd

Module simd Copy item path

§SIMD Arithmetic Kernels Module - High-Performance Arithmetic

§Overview

§Architecture Notes

Constants§

Functions§

Module simd