Skip to main content

Module aa_coverage

Module aa_coverage 

Source
Expand description

Per-pixel AA coverage counts for AaBuf rows.

Single public entry point:

  • aa_coverage_span(rows, x0, shape) — fills a shape buffer with per-pixel AA coverage counts for output pixels x0 .. x0+shape.len(). This is the hot path called from draw_aa_line in fill/mod.rs. Each output pixel maps to 4 bits (one nibble) in each of the 4 AaBuf rows; aa_coverage_span sums those nibbles across rows for every pixel in the span in one vectorised pass.

§AaBuf nibble layout

For AA_SIZE = 4, output pixel x occupies the nibble at byte x/2 of each row: the high nibble if x is even, the low nibble if x is odd. Each nibble holds 0–4 set bits (one per AA sub-sample). Summing the four rows gives a coverage count in 0..=16.

§Acceleration tiers for aa_coverage_span

§x86-64 (most to least preferred)

  1. AVX-512 BITALG (avx512bitalg + avx512bw): _mm512_popcnt_epi8 on nibble-isolated bytes, 128 output pixels per 64-byte iteration.
  2. AVX2 (avx2): VPSHUFB nibble lookup, 64 output pixels per 32-byte iteration.
  3. Scalar: byte-by-byte nibble lookup via NIBBLE_POP table.

§aarch64 (most to least preferred)

  1. SVE2 (nightly-sve2 feature + sve2 target feature): nibble-isolated svcnt_u8_z, svcntb()*2 output pixels per iteration. Requires nightly Rust and sve2 CPU feature.
  2. NEON: nibble-isolated vcntq_u8, 32 output pixels per 16-byte iteration. High/low nibbles extracted with vshrq_n_u8 + vandq_u8; four-row accumulation in u8 (max 16 ≤ 255); interleaved into shape via vst2q_u8. NEON is mandatory on all ARMv8-A cores; no runtime detection is needed.

Functions§

aa_coverage_span
Fill shape[i] with the AA coverage count (0..=16) for output pixel x0 + i.