Expand description
Per-pixel AA coverage counts for AaBuf rows.
Single public entry point:
aa_coverage_span(rows, x0, shape)— fills ashapebuffer with per-pixel AA coverage counts for output pixelsx0 .. x0+shape.len(). This is the hot path called fromdraw_aa_lineinfill/mod.rs. Each output pixel maps to 4 bits (one nibble) in each of the 4AaBufrows;aa_coverage_spansums those nibbles across rows for every pixel in the span in one vectorised pass.
§AaBuf nibble layout
For AA_SIZE = 4, output pixel x occupies the nibble at byte x/2 of
each row: the high nibble if x is even, the low nibble if x is
odd. Each nibble holds 0–4 set bits (one per AA sub-sample). Summing the
four rows gives a coverage count in 0..=16.
§Acceleration tiers for aa_coverage_span
§x86-64 (most to least preferred)
- AVX-512 BITALG (
avx512bitalg+avx512bw):_mm512_popcnt_epi8on nibble-isolated bytes, 128 output pixels per 64-byte iteration. - AVX2 (
avx2): VPSHUFB nibble lookup, 64 output pixels per 32-byte iteration. - Scalar: byte-by-byte nibble lookup via
NIBBLE_POPtable.
§aarch64 (most to least preferred)
- SVE2 (
nightly-sve2feature +sve2target feature): nibble-isolatedsvcnt_u8_z,svcntb()*2output pixels per iteration. Requires nightly Rust andsve2CPU feature. - NEON: nibble-isolated
vcntq_u8, 32 output pixels per 16-byte iteration. High/low nibbles extracted withvshrq_n_u8+vandq_u8; four-row accumulation in u8 (max 16 ≤ 255); interleaved intoshapeviavst2q_u8. NEON is mandatory on all ARMv8-A cores; no runtime detection is needed.
Functions§
- aa_
coverage_ span - Fill
shape[i]with the AA coverage count (0..=16) for output pixelx0 + i.