Crate jxl_encoder_simd

Expand description

SIMD-accelerated primitives for jxl_encoder.

This crate wraps platform-specific SIMD intrinsics behind safe public functions. The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and calls into these safe wrappers.

Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.

§Direct variant access

Each kernel is available in three forms:

A dispatching function (e.g. dct_8x8) that picks the best at runtime
Concrete _avx2(token, ...) / _neon(token, ...) / _scalar(...) variants

For hot loops, callers should summon a token once, then call the concrete variant directly from an #[arcane] function so LLVM can inline across the target-feature boundary.

Structs§

EntropyCoeffResult: Results from vectorized entropy coefficient processing.
NeonToken: Proof that NEON is available.

Constants§

NEWTON_EPS_DEFAULT: Newton’s method constants.
NEWTON_MAX_ITERS_DEFAULT

Traits§

SimdToken: Marker trait for SIMD capability tokens.

Functions§

cfl_find_best_multiplier: Find the best integer CfL multiplier via regularized least-squares.
cfl_find_best_multiplier_neon
cfl_find_best_multiplier_newton: Find the best integer CfL multiplier via Newton’s method with perceptual cost.
cfl_find_best_multiplier_newton_scalar: Scalar Newton’s method for CfL multiplier.
cfl_find_best_multiplier_scalar
compute_block_l2_errors: Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
compute_block_l2_errors_neon
compute_block_l2_errors_scalar
compute_mask1x1: Compute per-pixel masking field from XYB Y channel.
compute_mask1x1_neon
compute_mask1x1_scalar
compute_pre_erosion: Compute pre-erosion map from Y channel: per-pixel stencil with 4× downsampling.
compute_pre_erosion_neon
compute_pre_erosion_scalar
dct_4x4_full: Compute full DCT4X4 transform for 8x8 pixel block.
dct_4x4_full_scalar
dct_4x8_full: Compute full DCT4X8 transform for 8x8 pixel block.
dct_4x8_full_scalar
dct_8x8: Compute scaled 8x8 forward DCT with SIMD acceleration.
dct_8x4_full: Compute full DCT8X4 transform for 8x8 pixel block.
dct_8x4_full_scalar
dct_8x8_neon: NEON 8x8 forward DCT: two-pass (4 columns at a time), in-register transpose.
dct_8x8_scalar
dct_8x16: Compute scaled 8x16 forward DCT with SIMD acceleration.
dct_8x16_neon: NEON 8x16 forward DCT.
dct_8x16_scalar
dct_16x8: Compute scaled 16x8 forward DCT with SIMD acceleration.
dct_16x8_neon: NEON 16x8 forward DCT.
dct_16x8_scalar
dct_16x16: Compute 16x16 forward DCT with SIMD acceleration.
dct_16x32: Compute 16×32 forward DCT with SIMD acceleration.
dct_16x16_neon: NEON 16x16 forward DCT: process 4 rows at a time.
dct_16x16_scalar
dct_16x32_scalar: Scalar 16×32 forward DCT.
dct_32x16: Compute 32×16 forward DCT with SIMD acceleration.
dct_32x32: Compute 32×32 forward DCT with SIMD acceleration.
dct_32x64: Compute 32×64 forward DCT with SIMD acceleration.
dct_32x16_scalar: Scalar 32×16 forward DCT.
dct_32x32_scalar: Scalar 32×32 forward DCT.
dct_32x64_scalar: Scalar 32×64 forward DCT.
dct_64x32: Compute 64×32 forward DCT with SIMD acceleration.
dct_64x64: Compute 64×64 forward DCT with SIMD acceleration.
dct_64x32_scalar: Scalar 64×32 forward DCT.
dct_64x64_scalar: Scalar 64×64 forward DCT.
denoise_channel: Wiener filter for a single channel (runtime dispatch).
denoise_channel_neon
denoise_channel_scalar: Wiener filter for a single channel (scalar).
dequant_block_dct8: Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
dequant_dct8_neon
dequant_dct8_scalar
entropy_coeffs_neon
entropy_coeffs_scalar
entropy_estimate_coeffs: Vectorized entropy coefficient processing.
epf_step1: Apply EPF Step 1 to 3-channel XYB planes.
epf_step2: Apply EPF Step 2 to 3-channel XYB planes.
epf_step1_neon
epf_step1_scalar
epf_step2_neon
epf_step2_scalar
fast_log2f: Fast log2 approximation. Max relative error ~3e-7. Input must be > 0.
fast_pow2f: Fast base-2 exponentiation. Max relative error ~3e-7.
fast_powf: Fast power function: base^exponent. Max relative error ~3e-5.
forward_xyb_neon
forward_xyb_scalar
fused_dct8_entropy: Fused DCT8 + entropy estimation.
fused_dct8_entropy_fallback: Fallback: extract + separate DCT + entropy using dispatching functions.
gab_smooth_channel: Apply 3x3 weighted gaborish smooth to a single channel in-place.
gab_smooth_neon: NEON gab smooth: processes 4 pixels per iteration in interior rows.
gab_smooth_scalar
gaborish_5x5_channel: Apply the 5x5 gaborish inverse kernel to a single channel.
gaborish_5x5_neon: NEON gaborish 5x5: processes 4 pixels per iteration in interior region.
gaborish_5x5_scalar
idct_4x4_full: Compute full inverse DCT4X4 transform for 8x8 coefficient block.
idct_4x4_full_scalar
idct_4x8_full: Compute full inverse DCT4X8 transform for 8x8 coefficient block.
idct_4x8_full_scalar
idct_8x8: Compute scaled 8x8 inverse DCT with SIMD acceleration.
idct_8x4_full: Compute full inverse DCT8X4 transform for 8x8 coefficient block.
idct_8x4_full_scalar
idct_8x8_neon: NEON 8x8 inverse DCT.
idct_8x8_scalar
idct_8x16: Compute 8x16 inverse DCT with SIMD acceleration.
idct_8x16_neon: NEON 8x16 inverse DCT.
idct_8x16_scalar
idct_16x8: Compute 16x8 inverse DCT with SIMD acceleration.
idct_16x8_neon: NEON 16x8 inverse DCT.
idct_16x8_scalar
idct_16x16: Compute 16x16 inverse DCT with SIMD acceleration.
idct_16x32: Compute 16×32 inverse DCT with SIMD acceleration.
idct_16x16_neon: NEON 16x16 inverse DCT: process 4 rows at a time.
idct_16x16_scalar
idct_16x32_scalar: Scalar 16×32 inverse DCT.
idct_32x16: Compute 32×16 inverse DCT with SIMD acceleration.
idct_32x32: Compute 32×32 inverse DCT with SIMD acceleration.
idct_32x64: Compute 32×64 inverse DCT with SIMD acceleration.
idct_32x16_scalar: Scalar 32×16 inverse DCT.
idct_32x32_scalar: Scalar 32×32 inverse DCT.
idct_32x64_scalar: Scalar 32×64 inverse DCT.
idct_64x32: Compute 64×32 inverse DCT with SIMD acceleration.
idct_64x64: Compute 64×64 inverse DCT with SIMD acceleration.
idct_64x32_scalar: Scalar 64×32 inverse DCT.
idct_64x64_scalar: Scalar 64×64 inverse DCT.
inverse_xyb_neon
inverse_xyb_planar_neon
inverse_xyb_planar_scalar
inverse_xyb_scalar
linear_rgb_to_xyb_batch: Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
pad_plane: Pad a single channel plane with edge replication.
per_block_modulations: Apply per-block modulations to aq_map in-place.
per_block_modulations_neon
per_block_modulations_scalar
pixel_domain_loss: Compute pixel-domain loss for one channel of a block.
pixel_domain_loss_neon
pixel_domain_loss_scalar
quantize_block_dct8: Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
quantize_block_large: Quantize AC coefficients for a large block (DCT16+) to a flat output buffer.
quantize_dct8_neon
quantize_dct8_scalar
quantize_large_neon
quantize_large_scalar
shannon_entropy_bits: Compute Shannon entropy of a histogram: -sum(count * log2(count / total)).
shannon_entropy_neon
shannon_entropy_scalar: Scalar Shannon entropy using fast_log2f.
transpose_8x8: Transpose an 8x8 f32 matrix.
transpose_8x8_neon: NEON 8x8 transpose using four 4x4 sub-transposes.
vec_f32_dirty: Allocate a zero-initialized Vec<f32> of length n (safe default path).
xyb_to_linear_rgb_batch: Convert separate X, Y, B channel buffers to interleaved linear RGB.
xyb_to_linear_rgb_planar: Convert separate X, Y, B channel buffers to planar linear RGB.