Expand description
SIMD-accelerated primitives for jxl_encoder.
This crate wraps platform-specific SIMD intrinsics behind safe public functions.
The main encoder crate (jxl_encoder) maintains #![forbid(unsafe_code)] and
calls into these safe wrappers.
Uses archmage for token-based SIMD dispatch and magetypes for cross-platform vector types.
§Direct variant access
Each kernel is available in three forms:
- A dispatching function (e.g.
dct_8x8) that picks the best at runtime - Concrete
_avx2(token, ...)/_neon(token, ...)/_scalar(...)variants
For hot loops, callers should summon a token once, then call the concrete
variant directly from an #[arcane] function so LLVM can inline across the
target-feature boundary.
Structs§
- Entropy
Coeff Result - Results from vectorized entropy coefficient processing.
- Neon
Token - Proof that NEON is available.
Constants§
- NEWTON_
EPS_ DEFAULT - Newton’s method constants.
- NEWTON_
MAX_ ITERS_ DEFAULT
Traits§
- Simd
Token - Marker trait for SIMD capability tokens.
Functions§
- cfl_
find_ best_ multiplier - Find the best integer CfL multiplier via regularized least-squares.
- cfl_
find_ best_ multiplier_ neon - cfl_
find_ best_ multiplier_ newton - Find the best integer CfL multiplier via Newton’s method with perceptual cost.
- cfl_
find_ best_ multiplier_ newton_ scalar - Scalar Newton’s method for CfL multiplier.
- cfl_
find_ best_ multiplier_ scalar - compute_
block_ l2_ errors - Compute per-block masked weighted L2 error between original and reconstructed XYB planes.
- compute_
block_ l2_ errors_ neon - compute_
block_ l2_ errors_ scalar - compute_
mask1x1 - Compute per-pixel masking field from XYB Y channel.
- compute_
mask1x1_ neon - compute_
mask1x1_ scalar - compute_
pre_ erosion - Compute pre-erosion map from Y channel: per-pixel stencil with 4× downsampling.
- compute_
pre_ erosion_ neon - compute_
pre_ erosion_ scalar - dct_
4x4_ full - Compute full DCT4X4 transform for 8x8 pixel block.
- dct_
4x4_ full_ scalar - dct_
4x8_ full - Compute full DCT4X8 transform for 8x8 pixel block.
- dct_
4x8_ full_ scalar - dct_8x8
- Compute scaled 8x8 forward DCT with SIMD acceleration.
- dct_
8x4_ full - Compute full DCT8X4 transform for 8x8 pixel block.
- dct_
8x4_ full_ scalar - dct_
8x8_ neon - NEON 8x8 forward DCT: two-pass (4 columns at a time), in-register transpose.
- dct_
8x8_ scalar - dct_
8x16 - Compute scaled 8x16 forward DCT with SIMD acceleration.
- dct_
8x16_ neon - NEON 8x16 forward DCT.
- dct_
8x16_ scalar - dct_
16x8 - Compute scaled 16x8 forward DCT with SIMD acceleration.
- dct_
16x8_ neon - NEON 16x8 forward DCT.
- dct_
16x8_ scalar - dct_
16x16 - Compute 16x16 forward DCT with SIMD acceleration.
- dct_
16x32 - Compute 16×32 forward DCT with SIMD acceleration.
- dct_
16x16_ neon - NEON 16x16 forward DCT: process 4 rows at a time.
- dct_
16x16_ scalar - dct_
16x32_ scalar - Scalar 16×32 forward DCT.
- dct_
32x16 - Compute 32×16 forward DCT with SIMD acceleration.
- dct_
32x32 - Compute 32×32 forward DCT with SIMD acceleration.
- dct_
32x64 - Compute 32×64 forward DCT with SIMD acceleration.
- dct_
32x16_ scalar - Scalar 32×16 forward DCT.
- dct_
32x32_ scalar - Scalar 32×32 forward DCT.
- dct_
32x64_ scalar - Scalar 32×64 forward DCT.
- dct_
64x32 - Compute 64×32 forward DCT with SIMD acceleration.
- dct_
64x64 - Compute 64×64 forward DCT with SIMD acceleration.
- dct_
64x32_ scalar - Scalar 64×32 forward DCT.
- dct_
64x64_ scalar - Scalar 64×64 forward DCT.
- denoise_
channel - Wiener filter for a single channel (runtime dispatch).
- denoise_
channel_ neon - denoise_
channel_ scalar - Wiener filter for a single channel (scalar).
- dequant_
block_ dct8 - Dequantize a DCT8 block and apply CfL (chroma-from-luma) in one pass.
- dequant_
dct8_ neon - dequant_
dct8_ scalar - entropy_
coeffs_ neon - entropy_
coeffs_ scalar - entropy_
estimate_ coeffs - Vectorized entropy coefficient processing.
- epf_
step1 - Apply EPF Step 1 to 3-channel XYB planes.
- epf_
step2 - Apply EPF Step 2 to 3-channel XYB planes.
- epf_
step1_ neon - epf_
step1_ scalar - epf_
step2_ neon - epf_
step2_ scalar - fast_
log2f - Fast log2 approximation. Max relative error ~3e-7. Input must be > 0.
- fast_
pow2f - Fast base-2 exponentiation. Max relative error ~3e-7.
- fast_
powf - Fast power function:
base^exponent. Max relative error ~3e-5. - forward_
xyb_ neon - forward_
xyb_ scalar - fused_
dct8_ entropy - Fused DCT8 + entropy estimation.
- fused_
dct8_ entropy_ fallback - Fallback: extract + separate DCT + entropy using dispatching functions.
- gab_
smooth_ channel - Apply 3x3 weighted gaborish smooth to a single channel in-place.
- gab_
smooth_ neon - NEON gab smooth: processes 4 pixels per iteration in interior rows.
- gab_
smooth_ scalar - gaborish_
5x5_ channel - Apply the 5x5 gaborish inverse kernel to a single channel.
- gaborish_
5x5_ neon - NEON gaborish 5x5: processes 4 pixels per iteration in interior region.
- gaborish_
5x5_ scalar - idct_
4x4_ full - Compute full inverse DCT4X4 transform for 8x8 coefficient block.
- idct_
4x4_ full_ scalar - idct_
4x8_ full - Compute full inverse DCT4X8 transform for 8x8 coefficient block.
- idct_
4x8_ full_ scalar - idct_
8x8 - Compute scaled 8x8 inverse DCT with SIMD acceleration.
- idct_
8x4_ full - Compute full inverse DCT8X4 transform for 8x8 coefficient block.
- idct_
8x4_ full_ scalar - idct_
8x8_ neon - NEON 8x8 inverse DCT.
- idct_
8x8_ scalar - idct_
8x16 - Compute 8x16 inverse DCT with SIMD acceleration.
- idct_
8x16_ neon - NEON 8x16 inverse DCT.
- idct_
8x16_ scalar - idct_
16x8 - Compute 16x8 inverse DCT with SIMD acceleration.
- idct_
16x8_ neon - NEON 16x8 inverse DCT.
- idct_
16x8_ scalar - idct_
16x16 - Compute 16x16 inverse DCT with SIMD acceleration.
- idct_
16x32 - Compute 16×32 inverse DCT with SIMD acceleration.
- idct_
16x16_ neon - NEON 16x16 inverse DCT: process 4 rows at a time.
- idct_
16x16_ scalar - idct_
16x32_ scalar - Scalar 16×32 inverse DCT.
- idct_
32x16 - Compute 32×16 inverse DCT with SIMD acceleration.
- idct_
32x32 - Compute 32×32 inverse DCT with SIMD acceleration.
- idct_
32x64 - Compute 32×64 inverse DCT with SIMD acceleration.
- idct_
32x16_ scalar - Scalar 32×16 inverse DCT.
- idct_
32x32_ scalar - Scalar 32×32 inverse DCT.
- idct_
32x64_ scalar - Scalar 32×64 inverse DCT.
- idct_
64x32 - Compute 64×32 inverse DCT with SIMD acceleration.
- idct_
64x64 - Compute 64×64 inverse DCT with SIMD acceleration.
- idct_
64x32_ scalar - Scalar 64×32 inverse DCT.
- idct_
64x64_ scalar - Scalar 64×64 inverse DCT.
- inverse_
xyb_ neon - inverse_
xyb_ planar_ neon - inverse_
xyb_ planar_ scalar - inverse_
xyb_ scalar - linear_
rgb_ to_ xyb_ batch - Convert separate R, G, B channel buffers to separate X, Y, B channel buffers.
- pad_
plane - Pad a single channel plane with edge replication.
- per_
block_ modulations - Apply per-block modulations to aq_map in-place.
- per_
block_ modulations_ neon - per_
block_ modulations_ scalar - pixel_
domain_ loss - Compute pixel-domain loss for one channel of a block.
- pixel_
domain_ loss_ neon - pixel_
domain_ loss_ scalar - quantize_
block_ dct8 - Quantize a DCT8 block (64 coefficients) with dead-zone thresholding.
- quantize_
block_ large - Quantize AC coefficients for a large block (DCT16+) to a flat output buffer.
- quantize_
dct8_ neon - quantize_
dct8_ scalar - quantize_
large_ neon - quantize_
large_ scalar - shannon_
entropy_ bits - Compute Shannon entropy of a histogram: -sum(count * log2(count / total)).
- shannon_
entropy_ neon - shannon_
entropy_ scalar - Scalar Shannon entropy using fast_log2f.
- transpose_
8x8 - Transpose an 8x8 f32 matrix.
- transpose_
8x8_ neon - NEON 8x8 transpose using four 4x4 sub-transposes.
- vec_
f32_ dirty - Allocate a zero-initialized
Vec<f32>of lengthn(safe default path). - xyb_
to_ linear_ rgb_ batch - Convert separate X, Y, B channel buffers to interleaved linear RGB.
- xyb_
to_ linear_ rgb_ planar - Convert separate X, Y, B channel buffers to planar linear RGB.