Expand description
Hand-written assembly SIMD kernels for OxiMedia
This crate provides highly optimized assembly implementations of critical
performance paths in the OxiMedia video codec, including:
- DCT (Discrete Cosine Transform) in various sizes
- Interpolation kernels (bilinear, bicubic, 8-tap)
- SAD (Sum of Absolute Differences) for motion estimation
All assembly is wrapped in safe Rust APIs with proper alignment checks, buffer validation, and runtime CPU feature detection.
Modules§
- accumulator
- Running statistics accumulator for audio/video sample streams.
- alpha_
premul - Alpha pre-multiplication and un-pre-multiplication helpers.
- audio_
ops - SIMD-optimized audio buffer operations.
- avx512
- AVX-512 SIMD implementations for high-throughput media processing.
- bitwise_
ops - Bitwise SIMD-style operations for multimedia processing.
- blend
- Alpha blending and compositing operations.
- blend_
simd - SIMD-optimized blending operations
- color_
convert_ simd - BT.709 RGB ↔ YUV colour conversion using scalar SIMD-friendly code.
- color_
space - Color space conversion operations optimized for SIMD-friendly access patterns.
- convolution
- SIMD convolution operations
- deblock_
filter - SIMD-accelerated deblocking filter for codec post-processing.
- dispatch
- Runtime-dispatched SIMD wrappers for core media processing operations.
- dot_
product - Dot product operations for SIMD-accelerated signal processing.
- entropy_
coding - SIMD helpers for bit packing / unpacking used in entropy coding.
- filter
- SIMD-friendly image filtering operations.
- fixed_
point - Fixed-point arithmetic helpers for SIMD-friendly computations.
- gather_
scatter - SIMD-style gather and scatter memory operations.
- histogram
- Histogram computation for pixel data analysis.
- interleave
- Audio channel interleaving and de-interleaving.
- lookup_
table - Lookup table (LUT) primitives for fast pixel transformations.
- math_
ops - SIMD-optimized mathematical operations on f32 slices.
- matrix
- SIMD-friendly 4×4 matrix operations.
- min_max
- SIMD-optimized minimum and maximum operations for pixel/sample data.
- motion_
search - SIMD-accelerated motion search using diamond and hexagonal search patterns.
- neon
- ARM NEON SIMD backend for OxiMedia.
- pack_
unpack - Pixel packing and unpacking utilities for SIMD pipelines.
- pixel_
ops - SIMD-optimized pixel / image operations.
- portable
- Portable SIMD-friendly operations using scalar code structured for auto-vectorization by the compiler.
- prefix_
sum - Parallel prefix sum (scan) operations for multimedia processing.
- psnr
- Peak Signal-to-Noise Ratio (PSNR) computation kernel.
- reduce
- Reduction operations (sum, min, max, mean, variance, normalize) on f32 slices.
- resize
- SIMD-accelerated image scaling (resize) with bilinear and Lanczos filters.
- satd
- Sum of Absolute Transformed Differences (SATD) kernel.
- saturate
- Saturating arithmetic primitives for media pixel pipelines.
- simd_
bench - Runtime benchmark utilities for
oximedia-simd. - ssim
- Structural Similarity Index Measure (SSIM) kernel with scalar and SIMD-accelerated implementations.
- swizzle
- SIMD lane swizzle, permute, and shuffle operations.
- threshold
- SIMD threshold operations
- transpose
- Block transpose primitives for matrix and image data.
- vector_
math - Vector math types:
Vec2,Vec3,Vec4. - yuv_ops
- YUV colour-space conversion helpers.
Structs§
- CpuFeatures
- CPU features detected at runtime.
Enums§
- Block
Size - Block sizes for SAD operations
- DctSize
- DCT transform sizes
- Interpolation
Filter - Interpolation filter types
- Simd
Error - Error types for SIMD operations
Functions§
- detect_
cpu_ features - Detect CPU features at runtime
- forward_
dct - Perform forward DCT transform
- has_
neon - Returns
truewhen the executing CPU provides NEON (alwaystrueonaarch64). - interpolate
- Perform interpolation for motion compensation
- inverse_
dct - Perform inverse DCT transform
- is_
aligned - Check if a pointer is properly aligned for SIMD operations
- sad
- Calculate Sum of Absolute Differences (SAD)
- validate_
avx2_ alignment - Validate buffer alignment for AVX2 (32-byte alignment)
- validate_
avx512_ alignment - Validate buffer alignment for AVX-512 (64-byte alignment)
- validate_
neon_ alignment - Validate buffer alignment for NEON (16-byte alignment)