Skip to main content

Crate jxl_encoder_simd

Crate jxl_encoder_simd 

Source
Expand description

SIMD-accelerated primitives for jxl_encoder.

This crate wraps platform-specific SIMD intrinsics behind safe public functions. The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and calls into these safe wrappers.

Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.

§Direct variant access

Each kernel is available in three forms:

  • A dispatching function (e.g. dct_8x8) that picks the best at runtime
  • Concrete _avx2(token, ...) / _neon(token, ...) / _scalar(...) variants

For hot loops, callers should summon a token once, then call the concrete variant directly from an #[arcane] function so LLVM can inline across the target-feature boundary.

Structs§

EntropyCoeffResult
Results from vectorized entropy coefficient processing.
NeonToken
Proof that NEON is available.

Constants§

NEWTON_EPS_DEFAULT
Newton’s method constants.
NEWTON_MAX_ITERS_DEFAULT

Traits§

SimdToken
Marker trait for SIMD capability tokens.

Functions§

cfl_find_best_multiplier
Find the best integer CfL multiplier via regularized least-squares.
cfl_find_best_multiplier_neon
cfl_find_best_multiplier_newton
Find the best integer CfL multiplier via Newton’s method with perceptual cost.
cfl_find_best_multiplier_newton_scalar
Scalar Newton’s method for CfL multiplier.
cfl_find_best_multiplier_scalar
compute_block_l2_errors
Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
compute_block_l2_errors_neon
compute_block_l2_errors_scalar
compute_mask1x1
Compute per-pixel masking field from XYB Y channel.
compute_mask1x1_neon
compute_mask1x1_scalar
compute_pre_erosion
Compute pre-erosion map from Y channel: per-pixel stencil with 4× downsampling.
compute_pre_erosion_neon
compute_pre_erosion_scalar
dct_4x4_full
Compute full DCT4X4 transform for 8x8 pixel block.
dct_4x4_full_scalar
dct_4x8_full
Compute full DCT4X8 transform for 8x8 pixel block.
dct_4x8_full_scalar
dct_8x8
Compute scaled 8x8 forward DCT with SIMD acceleration.
dct_8x4_full
Compute full DCT8X4 transform for 8x8 pixel block.
dct_8x4_full_scalar
dct_8x8_neon
NEON 8x8 forward DCT: two-pass (4 columns at a time), in-register transpose.
dct_8x8_scalar
dct_8x16
Compute scaled 8x16 forward DCT with SIMD acceleration.
dct_8x16_neon
NEON 8x16 forward DCT.
dct_8x16_scalar
dct_16x8
Compute scaled 16x8 forward DCT with SIMD acceleration.
dct_16x8_neon
NEON 16x8 forward DCT.
dct_16x8_scalar
dct_16x16
Compute 16x16 forward DCT with SIMD acceleration.
dct_16x32
Compute 16×32 forward DCT with SIMD acceleration.
dct_16x16_neon
NEON 16x16 forward DCT: process 4 rows at a time.
dct_16x16_scalar
dct_16x32_scalar
Scalar 16×32 forward DCT.
dct_32x16
Compute 32×16 forward DCT with SIMD acceleration.
dct_32x32
Compute 32×32 forward DCT with SIMD acceleration.
dct_32x64
Compute 32×64 forward DCT with SIMD acceleration.
dct_32x16_scalar
Scalar 32×16 forward DCT.
dct_32x32_scalar
Scalar 32×32 forward DCT.
dct_32x64_scalar
Scalar 32×64 forward DCT.
dct_64x32
Compute 64×32 forward DCT with SIMD acceleration.
dct_64x64
Compute 64×64 forward DCT with SIMD acceleration.
dct_64x32_scalar
Scalar 64×32 forward DCT.
dct_64x64_scalar
Scalar 64×64 forward DCT.
denoise_channel
Wiener filter for a single channel (runtime dispatch).
denoise_channel_neon
denoise_channel_scalar
Wiener filter for a single channel (scalar).
dequant_block_dct8
Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
dequant_dct8_neon
dequant_dct8_scalar
entropy_coeffs_neon
entropy_coeffs_scalar
entropy_estimate_coeffs
Vectorized entropy coefficient processing.
epf_step1
Apply EPF Step 1 to 3-channel XYB planes.
epf_step2
Apply EPF Step 2 to 3-channel XYB planes.
epf_step1_neon
epf_step1_scalar
epf_step2_neon
epf_step2_scalar
fast_log2f
Fast log2 approximation. Max relative error ~3e-7. Input must be > 0.
fast_pow2f
Fast base-2 exponentiation. Max relative error ~3e-7.
fast_powf
Fast power function: base^exponent. Max relative error ~3e-5.
forward_xyb_neon
forward_xyb_scalar
fused_dct8_entropy
Fused DCT8 + entropy estimation.
fused_dct8_entropy_fallback
Fallback: extract + separate DCT + entropy using dispatching functions.
gab_smooth_channel
Apply 3x3 weighted gaborish smooth to a single channel in-place.
gab_smooth_neon
NEON gab smooth: processes 4 pixels per iteration in interior rows.
gab_smooth_scalar
gaborish_5x5_channel
Apply the 5x5 gaborish inverse kernel to a single channel.
gaborish_5x5_neon
NEON gaborish 5x5: processes 4 pixels per iteration in interior region.
gaborish_5x5_scalar
idct_4x4_full
Compute full inverse DCT4X4 transform for 8x8 coefficient block.
idct_4x4_full_scalar
idct_4x8_full
Compute full inverse DCT4X8 transform for 8x8 coefficient block.
idct_4x8_full_scalar
idct_8x8
Compute scaled 8x8 inverse DCT with SIMD acceleration.
idct_8x4_full
Compute full inverse DCT8X4 transform for 8x8 coefficient block.
idct_8x4_full_scalar
idct_8x8_neon
NEON 8x8 inverse DCT.
idct_8x8_scalar
idct_8x16
Compute 8x16 inverse DCT with SIMD acceleration.
idct_8x16_neon
NEON 8x16 inverse DCT.
idct_8x16_scalar
idct_16x8
Compute 16x8 inverse DCT with SIMD acceleration.
idct_16x8_neon
NEON 16x8 inverse DCT.
idct_16x8_scalar
idct_16x16
Compute 16x16 inverse DCT with SIMD acceleration.
idct_16x32
Compute 16×32 inverse DCT with SIMD acceleration.
idct_16x16_neon
NEON 16x16 inverse DCT: process 4 rows at a time.
idct_16x16_scalar
idct_16x32_scalar
Scalar 16×32 inverse DCT.
idct_32x16
Compute 32×16 inverse DCT with SIMD acceleration.
idct_32x32
Compute 32×32 inverse DCT with SIMD acceleration.
idct_32x64
Compute 32×64 inverse DCT with SIMD acceleration.
idct_32x16_scalar
Scalar 32×16 inverse DCT.
idct_32x32_scalar
Scalar 32×32 inverse DCT.
idct_32x64_scalar
Scalar 32×64 inverse DCT.
idct_64x32
Compute 64×32 inverse DCT with SIMD acceleration.
idct_64x64
Compute 64×64 inverse DCT with SIMD acceleration.
idct_64x32_scalar
Scalar 64×32 inverse DCT.
idct_64x64_scalar
Scalar 64×64 inverse DCT.
inverse_xyb_neon
inverse_xyb_planar_neon
inverse_xyb_planar_scalar
inverse_xyb_scalar
linear_rgb_to_xyb_batch
Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
pad_plane
Pad a single channel plane with edge replication.
per_block_modulations
Apply per-block modulations to aq_map in-place.
per_block_modulations_neon
per_block_modulations_scalar
pixel_domain_loss
Compute pixel-domain loss for one channel of a block.
pixel_domain_loss_neon
pixel_domain_loss_scalar
quantize_block_dct8
Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
quantize_block_large
Quantize AC coefficients for a large block (DCT16+) to a flat output buffer.
quantize_dct8_neon
quantize_dct8_scalar
quantize_large_neon
quantize_large_scalar
shannon_entropy_bits
Compute Shannon entropy of a histogram: -sum(count * log2(count / total)).
shannon_entropy_neon
shannon_entropy_scalar
Scalar Shannon entropy using fast_log2f.
transpose_8x8
Transpose an 8x8 f32 matrix.
transpose_8x8_neon
NEON 8x8 transpose using four 4x4 sub-transposes.
vec_f32_dirty
Allocate a zero-initialized Vec<f32> of length n (safe default path).
xyb_to_linear_rgb_batch
Convert separate X, Y, B channel buffers to interleaved linear RGB.
xyb_to_linear_rgb_planar
Convert separate X, Y, B channel buffers to planar linear RGB.