zensim
Fast psychovisual image similarity metric combining ideas from SSIMULACRA2 and butteraugli. Multi-scale SSIM + edge + high-frequency features in XYB color space, with trained weights and AVX2/AVX-512 SIMD.
Quick start
use ;
# let = ;
let z = new;
let source = new;
let distorted = new;
let result = z.compute?;
println!;
# Ok::
Batch comparison (one reference, many distorted)
use ;
# let = ;
# let distorted_images: = vec!;
let z = new;
let source = new;
let precomputed = z.precompute_reference?;
for dst_pixels in &distorted_images
# Ok::
RGBA support
use ;
# let = ;
let z = new;
let source = new;
let distorted = new;
let result = z.compute?;
# Ok::
Input requirements
- Color space: All inputs must be sRGB-encoded (gamma ~2.2) — the
standard output of JPEG, PNG, and WebP decoders. For linear-light data,
use
PixelFormat::LinearF32Rgbavia [StridedBytes]. - Wide gamut: Display P3 and BT.2020 primaries are accepted via
[
ColorPrimaries] on [StridedBytes] — gamut-mapped to sRGB internally. Passing wide-gamut data as sRGB will produce incorrect scores. - Pixel formats: [
RgbSlice] (sRGB u8), [RgbaSlice] (sRGB u8 + alpha),imgref::ImgRef(sRGB u8, stride-aware, default feature), [StridedBytes] (any ofSrgb8Rgb,Srgb8Rgba,Srgb8Bgra,Srgb16Rgba,LinearF32Rgba), or implement [ImageSource] directly. - Alpha: RGBA inputs are composited over a checkerboard so alpha
differences produce visible distortion. Supports
StraightandOpaquealpha modes. - Dimensions: Both images must be the same width × height, minimum 8×8.
Score semantics
100 = identical, higher = more similar. Score mapping:
100 - 18 × d^0.7 where d is the per-scale weighted feature distance.
Calibrated from 0–100 on 344k training pairs; extreme distortions can
score below 0 (uncalibrated outside the training range).
[ZensimResult] also provides approx_ssim2(),
approx_dssim(), and
approx_butteraugli() for direct
metric approximations. The [mapping] module has bidirectional interpolation
tables for score-level conversions.
Determinism
Deterministic for the same input on the same architecture. Cross-architecture results (e.g. AVX2 vs scalar vs AVX-512) may differ by small ULP due to different FMA contraction behavior.
Design
- XYB color space — cube root LMS, same perceptual space as ssimulacra2/butteraugli
- Modified SSIM — ssimulacra2's variant: drops the luminance denominator
(no C1), uses
1 - (mu1-mu2)²directly. Correct for perceptually-uniform spaces. - 19 features per channel per scale — 13 basic (SSIM, edge artifact/detail loss, MSE, high-frequency) + 6 peak features, all scored
- 4-scale pyramid — 1×, 2×, 4×, 8× via box downscale (ssimulacra2 uses 6)
- O(1)-per-pixel box blur — single-pass with fused SIMD kernel
- 228 trained weights — optimized on 344k synthetic pairs across 6 codecs
- AVX2/AVX-512 SIMD throughout via archmage
See the metric module source for the full feature extraction math.