Crate jxl_encoder_simd

Expand description

SIMD-accelerated primitives for jxl_encoder.

This crate wraps platform-specific SIMD intrinsics behind safe public functions. The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and calls into these safe wrappers.

Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.

§Direct variant access

Each kernel is available in three forms:

A dispatching function (e.g. dct_8x8) that picks the best at runtime
Concrete _avx2(token, ...) / _neon(token, ...) / _scalar(...) variants

For hot loops, callers should summon a token once, then call the concrete variant directly from an #[arcane] function so LLVM can inline across the target-feature boundary.

Structs§

EntropyCoeffResult: Results from vectorized entropy coefficient processing.
NeonToken: Proof that NEON is available.

Traits§

SimdToken: Marker trait for SIMD capability tokens.

Functions§

compute_block_l2_errors: Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
compute_block_l2_errors_neon
compute_block_l2_errors_scalar
compute_mask1x1: Compute per-pixel masking field from XYB Y channel.
compute_mask1x1_neon
compute_mask1x1_scalar
dct_8x8: Compute scaled 8x8 forward DCT with SIMD acceleration.
dct_8x8_neon: NEON 8x8 forward DCT: two-pass (4 columns at a time), in-register transpose.
dct_8x8_scalar
dct_8x16: Compute scaled 8x16 forward DCT with SIMD acceleration.
dct_8x16_neon: NEON 8x16 forward DCT.
dct_8x16_scalar
dct_16x8: Compute scaled 16x8 forward DCT with SIMD acceleration.
dct_16x8_neon: NEON 16x8 forward DCT.
dct_16x8_scalar
dct_16x16: Compute 16x16 forward DCT with SIMD acceleration.
dct_16x16_neon: NEON 16x16 forward DCT: process 4 rows at a time.
dct_16x16_scalar
dequant_block_dct8: Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
dequant_dct8_neon
dequant_dct8_scalar
entropy_coeffs_neon
entropy_coeffs_scalar
entropy_estimate_coeffs: Vectorized entropy coefficient processing.
epf_step1: Apply EPF Step 1 to 3-channel XYB planes.
epf_step2: Apply EPF Step 2 to 3-channel XYB planes.
epf_step1_neon
epf_step1_scalar
epf_step2_neon
epf_step2_scalar
forward_xyb_neon
forward_xyb_scalar
gab_smooth_channel: Apply 3x3 weighted gaborish smooth to a single channel in-place.
gab_smooth_neon: NEON gab smooth: processes 4 pixels per iteration in interior rows.
gab_smooth_scalar
gaborish_5x5_channel: Apply the 5x5 gaborish inverse kernel to a single channel.
gaborish_5x5_neon: NEON gaborish 5x5: processes 4 pixels per iteration in interior region.
gaborish_5x5_scalar
idct_8x8: Compute scaled 8x8 inverse DCT with SIMD acceleration.
idct_8x8_neon: NEON 8x8 inverse DCT.
idct_8x8_scalar
idct_8x16: Compute 8x16 inverse DCT with SIMD acceleration.
idct_8x16_neon: NEON 8x16 inverse DCT.
idct_8x16_scalar
idct_16x8: Compute 16x8 inverse DCT with SIMD acceleration.
idct_16x8_neon: NEON 16x8 inverse DCT.
idct_16x8_scalar
idct_16x16: Compute 16x16 inverse DCT with SIMD acceleration.
idct_16x16_neon: NEON 16x16 inverse DCT: process 4 rows at a time.
idct_16x16_scalar
inverse_xyb_neon
inverse_xyb_planar_neon
inverse_xyb_planar_scalar
inverse_xyb_scalar
linear_rgb_to_xyb_batch: Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
pixel_domain_loss: Compute pixel-domain loss for one channel of a block.
pixel_domain_loss_neon
pixel_domain_loss_scalar
quantize_block_dct8: Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
quantize_dct8_neon
quantize_dct8_scalar
transpose_8x8: Transpose an 8x8 f32 matrix.
transpose_8x8_neon: NEON 8x8 transpose using four 4x4 sub-transposes.
xyb_to_linear_rgb_batch: Convert separate X, Y, B channel buffers to interleaved linear RGB.
xyb_to_linear_rgb_planar: Convert separate X, Y, B channel buffers to planar linear RGB.