Module vectorized_broadcast

Module vectorized_broadcast 

Source
Expand description

Vectorized broadcasting operations

This module provides high-performance broadcasting for aligned tensor shapes. When tensor shapes are compatible and memory is properly aligned, we can use SIMD instructions for dramatic speedups.

§Optimization Strategy

  1. Detect common broadcasting patterns (scalar, 1D broadcast, etc.)
  2. Use specialized kernels for each pattern
  3. Leverage SIMD for aligned, contiguous data
  4. Fall back to standard broadcasting for complex cases

§Common Patterns

  • Scalar broadcast: (1,) + (N,M,K) → (N,M,K)
  • Vector broadcast: (N,) + (N,M) → (N,M)
  • Matrix broadcast: (N,M) + (N,M,K) → (N,M,K)