Expand description
SIMD-accelerated primitives for jxl_encoder.
This crate wraps platform-specific SIMD intrinsics behind safe public functions.
The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and
calls into these safe wrappers.
Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.
§Direct variant access
Each kernel is available in three forms:
- A dispatching function (e.g.
dct_8x8) that picks the best at runtime - Concrete
_avx2(token, ...)/_neon(token, ...)/_scalar(...)variants
For hot loops, callers should summon a token once, then call the concrete
variant directly from an #[arcane] function so LLVM can inline across the
target-feature boundary.
Structs§
- Entropy
Coeff Result - Results from vectorized entropy coefficient processing.
- X64V3
Token - Proof that AVX2 + FMA + BMI1/2 + F16C + LZCNT are available (x86-64-v3 level).
Traits§
- Simd
Token - Marker trait for SIMD capability tokens.
Functions§
- compute_
block_ l2_ errors - Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
- compute_
block_ l2_ errors_ avx2 - compute_
block_ l2_ errors_ scalar - compute_
mask1x1 - Compute per-pixel masking field from XYB Y channel.
- compute_
mask1x1_ avx2 - compute_
mask1x1_ scalar - dct_8x8
- Compute scaled 8x8 forward DCT with SIMD acceleration.
- dct_
8x8_ avx2 - dct_
8x8_ scalar - dct_
8x16 - Compute scaled 8x16 forward DCT with SIMD acceleration.
- dct_
8x16_ avx2 - dct_
8x16_ scalar - dct_
16x8 - Compute scaled 16x8 forward DCT with SIMD acceleration.
- dct_
16x8_ avx2 - dct_
16x8_ scalar - dct_
16x16 - Compute 16x16 forward DCT with SIMD acceleration.
- dct_
16x16_ avx2 - AVX2 16x16 forward DCT: process 8 rows at a time via batched 16-point DCT.
- dct_
16x16_ scalar - dequant_
block_ dct8 - Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
- dequant_
dct8_ avx2 - dequant_
dct8_ scalar - entropy_
coeffs_ avx2 - entropy_
coeffs_ scalar - entropy_
estimate_ coeffs - Vectorized entropy coefficient processing.
- epf_
step1 - Apply EPF Step 1 to 3-channel XYB planes.
- epf_
step2 - Apply EPF Step 2 to 3-channel XYB planes.
- epf_
step1_ avx2 - epf_
step1_ scalar - epf_
step2_ avx2 - epf_
step2_ scalar - forward_
xyb_ avx2 - forward_
xyb_ scalar - gab_
smooth_ avx2 - AVX2+FMA gab smooth: processes 8 pixels per iteration in interior rows. Border rows/columns use scalar fallback.
- gab_
smooth_ channel - Apply 3x3 weighted gaborish smooth to a single channel in-place.
- gab_
smooth_ scalar - gaborish_
5x5_ avx2 - AVX2+FMA gaborish 5x5: processes 8 pixels per iteration in interior region. Border pixels (within 2 of edge) use scalar fallback.
- gaborish_
5x5_ channel - Apply the 5x5 gaborish inverse kernel to a single channel.
- gaborish_
5x5_ scalar - idct_
8x8 - Compute scaled 8x8 inverse DCT with SIMD acceleration.
- idct_
8x8_ avx2 - idct_
8x8_ scalar - idct_
8x16 - Compute 8x16 inverse DCT with SIMD acceleration.
- idct_
8x16_ avx2 - idct_
8x16_ scalar - idct_
16x8 - Compute 16x8 inverse DCT with SIMD acceleration.
- idct_
16x8_ avx2 - idct_
16x8_ scalar - idct_
16x16 - Compute 16x16 inverse DCT with SIMD acceleration.
- idct_
16x16_ avx2 - AVX2 16x16 IDCT: process 8 rows at a time via batched 16-point IDCT.
- idct_
16x16_ scalar - inverse_
xyb_ avx2 - inverse_
xyb_ planar_ avx2 - inverse_
xyb_ planar_ scalar - inverse_
xyb_ scalar - linear_
rgb_ to_ xyb_ batch - Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
- pixel_
domain_ loss - Compute pixel-domain loss for one channel of a block.
- pixel_
domain_ loss_ avx2 - pixel_
domain_ loss_ scalar - quantize_
block_ dct8 - Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
- quantize_
dct8_ avx2 - quantize_
dct8_ scalar - transpose_
8x8 - Transpose an 8x8 f32 matrix.
- transpose_
8x8_ avx2 - AVX2 8x8 transpose using unpack/shuffle/permute instructions.
- xyb_
to_ linear_ rgb_ batch - Convert separate X, Y, B channel buffers to interleaved linear RGB.
- xyb_
to_ linear_ rgb_ planar - Convert separate X, Y, B channel buffers to planar linear RGB.