Skip to main content

Module blend

Module blend 

Source
Expand description

SIMD-accelerated solid-colour fill paths.

blend_solid_rgb8 — fill RGB pixels with a constant colour. blend_solid_gray8 — fill grayscale pixels with a constant value.

§Acceleration tiers (x86-64, most to least preferred for large spans)

  1. movdir64b (> 256 px): non-temporal 64-byte atomic stores that bypass all cache levels. Used for large write-only fills so the L3 V-Cache is not evicted of the edge table that the scanner keeps hot between page renders.
  2. AVX2 (≥ 32 px): 256-bit stores; fast for medium spans where the data will be read back shortly after writing.
  3. Scalar: copy_from_slice / fill per pixel.

The movdir64b path requires a 64-byte-aligned destination address. A scalar preamble advances the write pointer to the next alignment boundary; a scalar tail handles any remaining bytes. Because movdir64b is not yet exposed in std::arch::x86_64, runtime detection uses a std::sync::OnceLock that queries CPUID leaf 7 subleaf 0 ECX bit 28 exactly once per process.

§Acceleration tiers (aarch64)

  1. NEON vst3q_u8 (≥ 16 px): 48-byte interleaved RGB stores via three-channel scatter, 16 pixels per iteration.
  2. NEON vst1q_u8 (≥ 16 px, gray): 16-byte stores, 16 pixels per iteration.
  3. Scalar: copy_from_slice / fill per pixel.

NEON is mandatory on all ARMv8-A targets; no runtime detection is needed.

Functions§

blend_solid_gray8
Fill count grayscale pixels in dst (starting at byte 0) with color.
blend_solid_rgb8
Fill count RGB pixels in dst (starting at byte 0) with color.