Skip to main content

Module portable_simd

Module portable_simd 

Source
Expand description

§Portable SIMD Scan Kernels (Task 6)

Provides a family of SIMD kernels that avoid gather pathologies and work across diverse hardware:

  1. AVX-512: Gather or permute-based
  2. AVX2: Byte LUT via shuffle + partial sums
  3. NEON: Table lookup primitives
  4. Scalar: Universal fallback

§Design Principles

  • Prefer layouts that allow structured loads (SoA)
  • Use int16/int32 accumulation to reduce bandwidth
  • Minimize unpredictable memory refs
  • Maximize instruction-level parallelism (ILP)

§Math/Algorithm

Inner loop is Θ(N_scanned). Performance is dominated by:

  • Memory access patterns
  • Instruction throughput

Kernel design minimizes cache misses and maximizes ILP.

Structs§

Avx2Kernel
CpuFeatures
Detected CPU features
KernelDispatcher
Global kernel dispatcher that selects best implementation
ScalarKernel
Scalar fallback implementation (works everywhere)
ScanOps
High-level scan operations using best available SIMD

Enums§

SimdLevel
SIMD level for kernel dispatch

Traits§

DistanceKernel
Trait for portable distance kernels