Expand description
SIMD-accelerated solid-colour fill paths.
blend_solid_rgb8 — fill RGB pixels with a constant colour.
blend_solid_gray8 — fill grayscale pixels with a constant value.
§Acceleration tiers (x86-64, most to least preferred for large spans)
movdir64b(> 256 px): non-temporal 64-byte atomic stores that bypass all cache levels. Used for large write-only fills so the L3 V-Cache is not evicted of the edge table that the scanner keeps hot between page renders.- AVX2 (≥ 32 px): 256-bit stores; fast for medium spans where the data will be read back shortly after writing.
- Scalar:
copy_from_slice/fillper pixel.
The movdir64b path requires a 64-byte-aligned destination address. A scalar
preamble advances the write pointer to the next alignment boundary; a scalar
tail handles any remaining bytes. Because movdir64b is not yet exposed in
std::arch::x86_64, runtime detection uses a std::sync::OnceLock that
queries CPUID leaf 7 subleaf 0 ECX bit 28 exactly once per process.
§Acceleration tiers (aarch64)
- NEON
vst3q_u8(≥ 16 px): 48-byte interleaved RGB stores via three-channel scatter, 16 pixels per iteration. - NEON
vst1q_u8(≥ 16 px, gray): 16-byte stores, 16 pixels per iteration. - Scalar:
copy_from_slice/fillper pixel.
NEON is mandatory on all ARMv8-A targets; no runtime detection is needed.
Functions§
- blend_
solid_ gray8 - Fill
countgrayscale pixels indst(starting at byte 0) withcolor. - blend_
solid_ rgb8 - Fill
countRGB pixels indst(starting at byte 0) withcolor.