fast-thumbhash 0.1.0

A fast ThumbHash encoder/decoder — 10x+ faster drop-in replacement
Documentation
  • Coverage
  • 100%
    5 out of 5 items documented0 out of 4 items with examples
  • Size
  • Source code size: 32.43 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.56 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 12s Average build duration of successful builds.
  • all releases: 12s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • VectorPrivacy/fast-thumbhash
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • JSKitty

fast-thumbhash

A 12x faster drop-in replacement for the thumbhash crate.

ThumbHash is a compact image placeholder algorithm — similar to BlurHash but with better detail, color accuracy, and alpha support. This crate implements the same algorithm with aggressive low-level optimizations, producing perceptually identical results at a fraction of the cost.

Performance

Benchmarked on Apple M-series, 100x75 RGBA input, 10,000 iterations:

Operation thumbhash fast-thumbhash Speedup
Encode 172.7 µs 14.4 µs 12.0x
Decode 18.3 µs 1.5 µs 12.4x

How it's faster

Encoder:

  • Separable 2D DCT — splits the O(W*H*N) transform into two 1D passes, cutting multiply-adds by ~3x
  • Chebyshev cosine recurrence — computes cosine tables with 2 cos() calls per frequency instead of one per pixel
  • Stack-allocated buffers — cos tables and partial sums live on the stack, zero heap allocation in the hot path
  • Integer averaging — computes the average color in integer space, avoiding N float divisions
  • Opaque fast path — skips alpha compositing entirely for fully-opaque images (most photos)
  • Branchless nibble packing — collects AC coefficients then packs pairs without per-nibble branching

Decoder:

  • Separable 2D IDCT — precomputes x-contributions per frequency band, then accumulates rows via SAXPY (auto-vectorizes to NEON/SSE)
  • Stack-allocated everything — cos tables, AC buffers, and scratch rows all on the stack
  • Direct nibble indexing — reads AC data by index instead of through std::io::Read

Both paths use unsafe get_unchecked to eliminate bounds checks in inner loops.

Compatibility

fast-thumbhash is a perceptually identical replacement — not bit-identical. The separable DCT and Chebyshev recurrence change the floating-point evaluation order, producing sub-nibble rounding differences in some AC coefficients.

In practice, across a library of real-world images:

  • PSNR > 70 dB between decoded previews (60 dB is already indistinguishable to human eyes)
  • Maximum pixel delta of 2-3/255 in affected channels
  • Headers (dimensions, average color, structure) are always identical

The API is the same — swap the crate name and everything works.

Usage

[dependencies]
fast-thumbhash = "0.1"
use fast_thumbhash::{rgba_to_thumb_hash, thumb_hash_to_rgba};

// Encode: RGBA pixels → compact hash
let hash = rgba_to_thumb_hash(width, height, &rgba_pixels);

// Decode: hash → RGBA preview image
let (w, h, pixels) = thumb_hash_to_rgba(&hash).unwrap();

// Utilities
let (r, g, b, a) = fast_thumbhash::thumb_hash_to_average_rgba(&hash).unwrap();
let aspect = fast_thumbhash::thumb_hash_to_approximate_aspect_ratio(&hash).unwrap();

License

MIT