Skip to main content

Crate jxl_encoder_simd

Crate jxl_encoder_simd 

Source
Expand description

SIMD-accelerated primitives for jxl_encoder.

This crate wraps platform-specific SIMD intrinsics behind safe public functions. The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and calls into these safe wrappers.

Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.

§Direct variant access

Each kernel is available in three forms:

  • A dispatching function (e.g. dct_8x8) that picks the best at runtime
  • Concrete _avx2(token, ...) / _neon(token, ...) / _scalar(...) variants

For hot loops, callers should summon a token once, then call the concrete variant directly from an #[arcane] function so LLVM can inline across the target-feature boundary.

Structs§

EntropyCoeffResult
Results from vectorized entropy coefficient processing.
X64V3Token
Proof that AVX2 + FMA + BMI1/2 + F16C + LZCNT are available (x86-64-v3 level).

Traits§

SimdToken
Marker trait for SIMD capability tokens.

Functions§

compute_block_l2_errors
Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
compute_block_l2_errors_avx2
compute_block_l2_errors_scalar
compute_mask1x1
Compute per-pixel masking field from XYB Y channel.
compute_mask1x1_avx2
compute_mask1x1_scalar
dct_8x8
Compute scaled 8x8 forward DCT with SIMD acceleration.
dct_8x8_avx2
dct_8x8_scalar
dct_8x16
Compute scaled 8x16 forward DCT with SIMD acceleration.
dct_8x16_avx2
dct_8x16_scalar
dct_16x8
Compute scaled 16x8 forward DCT with SIMD acceleration.
dct_16x8_avx2
dct_16x8_scalar
dct_16x16
Compute 16x16 forward DCT with SIMD acceleration.
dct_16x16_avx2
AVX2 16x16 forward DCT: process 8 rows at a time via batched 16-point DCT.
dct_16x16_scalar
dequant_block_dct8
Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
dequant_dct8_avx2
dequant_dct8_scalar
entropy_coeffs_avx2
entropy_coeffs_scalar
entropy_estimate_coeffs
Vectorized entropy coefficient processing.
epf_step1
Apply EPF Step 1 to 3-channel XYB planes.
epf_step2
Apply EPF Step 2 to 3-channel XYB planes.
epf_step1_avx2
epf_step1_scalar
epf_step2_avx2
epf_step2_scalar
forward_xyb_avx2
forward_xyb_scalar
gab_smooth_avx2
AVX2+FMA gab smooth: processes 8 pixels per iteration in interior rows. Border rows/columns use scalar fallback.
gab_smooth_channel
Apply 3x3 weighted gaborish smooth to a single channel in-place.
gab_smooth_scalar
gaborish_5x5_avx2
AVX2+FMA gaborish 5x5: processes 8 pixels per iteration in interior region. Border pixels (within 2 of edge) use scalar fallback.
gaborish_5x5_channel
Apply the 5x5 gaborish inverse kernel to a single channel.
gaborish_5x5_scalar
idct_8x8
Compute scaled 8x8 inverse DCT with SIMD acceleration.
idct_8x8_avx2
idct_8x8_scalar
idct_8x16
Compute 8x16 inverse DCT with SIMD acceleration.
idct_8x16_avx2
idct_8x16_scalar
idct_16x8
Compute 16x8 inverse DCT with SIMD acceleration.
idct_16x8_avx2
idct_16x8_scalar
idct_16x16
Compute 16x16 inverse DCT with SIMD acceleration.
idct_16x16_avx2
AVX2 16x16 IDCT: process 8 rows at a time via batched 16-point IDCT.
idct_16x16_scalar
inverse_xyb_avx2
inverse_xyb_planar_avx2
inverse_xyb_planar_scalar
inverse_xyb_scalar
linear_rgb_to_xyb_batch
Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
pixel_domain_loss
Compute pixel-domain loss for one channel of a block.
pixel_domain_loss_avx2
pixel_domain_loss_scalar
quantize_block_dct8
Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
quantize_dct8_avx2
quantize_dct8_scalar
transpose_8x8
Transpose an 8x8 f32 matrix.
transpose_8x8_avx2
AVX2 8x8 transpose using unpack/shuffle/permute instructions.
xyb_to_linear_rgb_batch
Convert separate X, Y, B channel buffers to interleaved linear RGB.
xyb_to_linear_rgb_planar
Convert separate X, Y, B channel buffers to planar linear RGB.