Module simd

Expand description

§Bitmask SIMD Kernels - Vectorised High-Performance Bitmask Operations

SIMD-accelerated implementations of bitmask operations using portable vectorisation with std::simd. These kernels provide optimal performance for large bitmask operations through SIMD-parallel processing of multiple 64-bit words simultaneously.

§Overview

This module contains vectorised implementations of all bitmask operations. it uses configurable SIMD lane counts to adapt to different CPU architectures whilst maintaining code portability.

We do not check for SIMD alignment here because it is guaranteed by the Bitmask as it is backed by Minarrow’s Vec64.

§Architecture Principles

Portable SIMD: Uses std::simd for cross-platform vectorisation without target-specific code
Configurable lanes: Lane counts determined at build time for optimal performance per architecture
Hybrid processing: SIMD inner loops with scalar tail handling for non-aligned lengths
Low-cost abstraction: Bitmask is a light-weight structure over a Vec64. See Minarrow for details and benchmarks demonstrating very low abstraction cost.

§Memory Access Patterns

Vectorised loads process multiple words per memory operation
Sequential access patterns optimise cache utilisation
Aligned access where possible for maximum performance
Streaming patterns for large bitmask operations

§Specialised Algorithms

§Population Count (Popcount)

Uses SIMD reduction for optimal performance:

let counts = simd_vector.count_ones();
total += counts.reduce_sum() as usize;

§Equality Testing

Leverages SIMD comparison operations:

let eq_mask = vector_a.simd_eq(vector_b);
if !eq_mask.all() { return false; }

Functions§

all_eq_mask_simd: Vectorised equality test across entire bitmask windows with early termination optimisation.
all_false_mask_simd: Returns true if all bits in the mask are set to (0).
all_ne_mask_simd: Tests if all corresponding bits between two bitmask windows are different.
all_true_mask_simd: Returns true if all bits in the mask are set (1).
and_masks_simd: Performs vectorised bitwise AND operation between two bitmask windows.
bitmask_binop_simd: Primitive bit ops Performs vectorised bitwise binary operations (AND/OR/XOR) with configurable lane counts.
bitmask_unop_simd: Performs vectorised bitwise unary operations (NOT) with configurable lane counts.
eq_mask_simd: Produces a bitmask where each output bit is 1 iff the corresponding bits of a and b are equal.
in_mask_simd: Bitwise “in” for boolean bitmasks: each output bit is true if lhs bit is in the set of bits in rhs.
ne_mask_simd: Performs vectorised bitwise inequality comparison between two bitmask windows.
not_in_mask_simd: Performs vectorised bitwise “not in” membership test for boolean bitmasks.
not_mask_simd: Performs vectorised bitwise NOT operation on a bitmask window.
or_masks_simd: Performs vectorised bitwise OR operation between two bitmask windows.
popcount_mask_simd: Vectorised population count (number of set bits) with SIMD reduction for optimal performance.
xor_masks_simd: Performs vectorised bitwise XOR operation between two bitmask windows.

Module simd

Module simd Copy item path

§Bitmask SIMD Kernels - Vectorised High-Performance Bitmask Operations

§Overview

§Architecture Principles

§Memory Access Patterns

§Specialised Algorithms

§Population Count (Popcount)

§Equality Testing

Functions§

Module simd