Module cache_padded

Expand description

CachePadded<T> — target-aware cache-line alignment.

Wrap a contended atomic in CachePadded to keep it from sharing a cache line with neighboring fields. Without this, two atomics on the same line cause cache-coherency ping-pong between cores even when the threads writing to them touch logically independent data — “false sharing.” The L1-to-L1 round trip to re-fetch an invalidated line is tens of nanoseconds, catastrophic in a tight allocate-deallocate loop.

The alignment used is per-target:

x86_64, aarch64, powerpc64: 128 bytes. x86_64’s L1 line is 64 bytes but the adjacent-line prefetcher pulls cache lines in pairs, so a 64-byte pad still allows false sharing across the prefetched neighbor; 128 closes that gap. Apple Silicon (M-series) AArch64 uses 128-byte coherency granularity natively.
arm, mips, mips64, sparc, hexagon: 32 bytes.
m68k: 16 bytes.
s390x: 256 bytes.
Anything else: 64 bytes (the historical x86 line size).

The cfg matrix mirrors crossbeam_utils::CachePadded’s choices so benchmarks and reasoning carry across crates. We inline the definition rather than depending on crossbeam_utils to keep forge-alloc dependency-free at the runtime layer.

Structs§

CachePadded: Wraps a value so it occupies a whole cache line, preventing the neighboring fields in a struct from being invalidated when the wrapped atomic is written by another core.

Constants§

CACHE_LINE: The cache-line alignment used by CachePadded on this target. Surfaced so dependent crates and const _: () = assert!(...) layout pins can reference the same value the wrapper itself uses.