Skip to main content

Module binary

Module binary 

Source
Expand description

Binary element-wise operations for the CPU execution backend.

These operations are memory-bandwidth-bound (just read + write, minimal math). Parallelism via Rayon is NOT used because the memory bus is the bottleneck, not the CPU. Adding thread wake-up overhead (~20µs) hurts performance at all practical tensor sizes. The single-threaded vectorized loop already saturates the memory bus on modern CPUs.

Structs§

CpuBackend
The execution driver for standard host CPU memory.