Expand description
§SIMD Arithmetic Kernels Module - High-Performance Arithmetic
Inner SIMD-accelerated implementations using std::simd for maximum performance on modern hardware.
Prefer dispatch.rs for easily handling the general case, otherwise you can use these inner functions
directly (e.g., “dense_simd”) vs. “maybe masked, maybe simd”.
§Overview
- Portable SIMD: Uses
std::simdfor cross-platform vectorisation with compile-time lane optimisation - Null masks: Dense (no nulls) and masked variants for Arrow-compatible null handling. These are uniified in dispatch.rs, and opting out of masking yields no performance penalty.
- Type support: Integer and floating-point arithmetic with specialised FMA operations
- Safety: All unsafe operations are bounds-checked or guaranteed by caller invariants
§Architecture Notes
- Building blocks for higher-level dispatch layers, or for low-level hot loops where one wants to fully avoid abstraction overhead.
- Parallelisation intentionally excluded to allow flexible chunking strategies
- Power operations fall back to scalar for integers, use logarithmic computation for floats
Constants§
- W8
- Auto-generated SIMD lane widths from build.rs
SIMD lane count for 8-bit elements (u8, i8).
Determined at build time based on target architecture capabilities,
or overridden via
SIMD_LANES_OVERRIDE. - W16
- SIMD lane count for 16-bit elements (u16, i16).
Determined at build time based on target architecture capabilities,
or overridden via
SIMD_LANES_OVERRIDE. - W32
- SIMD lane count for 32-bit elements (u32, i32, f32).
Determined at build time based on target architecture capabilities,
or overridden via
SIMD_LANES_OVERRIDE. - W64
- SIMD lane count for 64-bit elements (u64, i64, f64).
Determined at build time based on target architecture capabilities,
or overridden via
SIMD_LANES_OVERRIDE.
Functions§
- float_
dense_ body_ f32_ simd - SIMD f32 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
- float_
dense_ body_ f64_ simd - SIMD f64 arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Division by zero produces Inf/NaN following IEEE 754 semantics.
- float_
masked_ body_ f32_ simd - SIMD f32 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
- float_
masked_ body_ f64_ simd - SIMD f64 arithmetic kernel with null mask support. Preserves IEEE 754 semantics: division by zero produces Inf/NaN, no exceptions. Power operations use scalar fallback with logarithmic computation.
- fma_
dense_ body_ f32_ simd - SIMD f32 fused multiply-add kernel for dense arrays (no nulls).
Hardware-accelerated
a.mul_add(b, c)with vectorised and scalar tail processing. - fma_
dense_ body_ f64_ simd - SIMD f64 fused multiply-add kernel for dense arrays (no nulls).
Hardware-accelerated
a.mul_add(b, c)with vectorised and scalar tail processing. - fma_
masked_ body_ f32_ simd - SIMD f32 fused multiply-add kernel with null mask support.
Hardware-accelerated
a.mul_add(b, c)with proper null propagation. - fma_
masked_ body_ f64_ simd - SIMD f64 fused multiply-add kernel with null mask support.
Hardware-accelerated
a.mul_add(b, c)with proper null propagation. - int_
dense_ body_ simd - SIMD integer arithmetic kernel for dense arrays (no nulls). Vectorised operations with scalar fallback for power operations and array tails. Panics on division/remainder by zero (consistent with scalar behaviour).
- int_
masked_ body_ simd - SIMD integer arithmetic kernel with null mask support. Division/remainder by zero produces null results (mask=false) rather than panicking.