Module aa_coverage

Expand description

Per-pixel AA coverage counts for AaBuf rows.

Single public entry point:

aa_coverage_span(rows, x0, shape) — fills a shape buffer with per-pixel AA coverage counts for output pixels x0 .. x0+shape.len(). This is the hot path called from draw_aa_line in fill/mod.rs. Each output pixel maps to 4 bits (one nibble) in each of the 4 AaBuf rows; aa_coverage_span sums those nibbles across rows for every pixel in the span in one vectorised pass.

§`AaBuf` nibble layout

For AA_SIZE = 4, output pixel x occupies the nibble at byte x/2 of each row: the high nibble if x is even, the low nibble if x is odd. Each nibble holds 0–4 set bits (one per AA sub-sample). Summing the four rows gives a coverage count in 0..=16.

§Acceleration tiers for `aa_coverage_span`

§x86-64 (most to least preferred)

AVX-512 BITALG (avx512bitalg + avx512bw): _mm512_popcnt_epi8 on nibble-isolated bytes, 128 output pixels per 64-byte iteration.
AVX2 (avx2): VPSHUFB nibble lookup, 64 output pixels per 32-byte iteration.
Scalar: byte-by-byte nibble lookup via NIBBLE_POP table.

§aarch64 (most to least preferred)

SVE2 (nightly-sve2 feature + sve2 target feature): nibble-isolated svcnt_u8_z, svcntb()*2 output pixels per iteration. Requires nightly Rust and sve2 CPU feature.
NEON: nibble-isolated vcntq_u8, 32 output pixels per 16-byte iteration. High/low nibbles extracted with vshrq_n_u8 + vandq_u8; four-row accumulation in u8 (max 16 ≤ 255); interleaved into shape via vst2q_u8. NEON is mandatory on all ARMv8-A cores; no runtime detection is needed.

Functions§

aa_coverage_span: Fill shape[i] with the AA coverage count (0..=16) for output pixel x0 + i.

Module aa_coverage

Module aa_coverage Copy item path

§AaBuf nibble layout

§Acceleration tiers for aa_coverage_span

§x86-64 (most to least preferred)

§aarch64 (most to least preferred)

Functions§

Module aa_coverage

§`AaBuf` nibble layout

§Acceleration tiers for `aa_coverage_span`