Skip to main content

Crate colorthief

Crate colorthief 

Source
Expand description

Color Thief

Dominant colors with human-vocabulary names for video keyframes — MMCQ extraction + nearest-neighbor lookup against the xkcd color survey.

github LoC Build codecov

docs.rs crates.io crates.io license

§Overview

colorthief extracts dominant colors from packed-RGB video keyframes and maps each to its closest entry in a 949-color human-vocabulary table sourced from the xkcd color survey. Built for video indexing and search-vocabulary pipelines: every output dominant carries both the actual MMCQ-extracted RGB (for swatch rendering) and the named Color (for search-index vocabulary), sorted descending by population.

§Crates in this workspace

CratePurpose
colorthiefDominant-color extraction (MMCQ) + naming pipeline. RgbFrame<'a> (8-bit) / Rgb48Frame<'a> (16-bit HDR) input.
colorthief-datasetStatic xkcd palette + nearest-neighbor lookup with three color-difference metrics (CIEDE2000, CIE94, Delta E 76). no_std + no_alloc.
xtaskBuild-time codegen — re-runs offline to regenerate the static dataset and CIEDE2000 LUT from the upstream CSV. Not published.

§Installation

[dependencies]
colorthief = "0.1"

# Or, if you only need the static palette + nearest-neighbor lookup
# (no MMCQ; works in no_std + no_alloc):
colorthief-dataset = "0.1"

Minimum supported Rust version: 1.95 (required for stable AVX-512F intrinsics and core::error::Error in no_std builds via thiserror 2 without its std feature).

§Examples

ExampleCrateRun
extractcolorthiefcargo run --release --example extract -p colorthief
extract_rgb48 (HDR / 16-bit)colorthiefcargo run --release --example extract_rgb48 -p colorthief
extract_no_alloc (static mut Mmcq + fixed buffer)colorthiefcargo run --release --example extract_no_alloc -p colorthief
lookup (name-only, no MMCQ)colorthief-datasetcargo run --release --example lookup -p colorthief-dataset

See more details in examples and examples.

§Algorithms

Three nearest-neighbor metrics, behind a #[non_exhaustive] #[repr(u8)] enum:

AlgorithmSpeed (NEON)Notes
Ciede2000Exact (default)~230 ns/query (LUT) or 71.5 µs (full scan)Modern perceptual gold-standard. Provably exact at u8 RGB resolution when lut feature is on.
Cie94~510 ns/queryAsymmetric (palette = reference). Mid-accuracy.
DeltaE76~470 ns/querySquared Euclidean LAB. Fastest, but well-known biases in the saturated blue / yellow regions.

The default Ciede2000Exact is ~310× faster than naive full-scan thanks to a pre-computed 32³ candidate-set LUT (see Architecture below).

§Feature flags

colorthief:

FeatureDefaultEffect
stdthread_local!-cached MMCQ workspace; zero-alloc-per-call after first call per thread. Implies alloc.
allocHeap allocator available; enables Vec<Dominant>-returning APIs and Mmcq::new_boxed().
lut32³ candidate-set LUT for CIEDE2000 — ~256 KB binary cost, ~310× CIEDE2000 speedup.

colorthief-dataset:

FeatureDefaultEffect
stdEnables x86_64 runtime CPU-feature detection.
allocForward-compat hook (current API is no_alloc).
lutThe 32³ CIEDE2000 LUT — propagated from colorthief/lut.

§No-std + no-alloc support

Both crates are usable in no_std + no_alloc environments. Caller manages the MMCQ workspace (a static mut Mmcq placed in .bss) and the output buffer (a fixed-size [Option<Dominant>; N]). See the extract_no_alloc example for the full pattern.

The Buffer<T> trait abstracts the output: Vec<T> (alloc-gated), [Option<T>; N], &mut [Option<T>] ship by default; consumers can plug in arrayvec::ArrayVec / heapless::Vec / custom types with a one-line impl Buffer<T>.

For zero-alloc-per-call in single-threaded no_std + alloc environments (typical wasm32-unknown-unknown / interrupt-free bare metal), place an Mmcq in static mut yourself — the unsafe then sits at your call site, not silently inside this crate.

§SIMD backends

Color::nearest_to (Delta E 76) and Color::nearest_to_cie94 dispatch to per-arch SIMD backends:

BackendISALanesDetection
aarch64_neonNEON4 (128-bit)compile-time (target_feature = "neon")
x86_avx512AVX-512F16 (512-bit)runtime (is_x86_feature_detected!)
x86_avx2AVX28 (256-bit)runtime
x86_sse41SSE4.14 (128-bit)runtime
wasm_simd128SIMD1284 (128-bit)compile-time (target_feature = "simd128")
scalar1always available

Every backend is bit-identical to the scalar reference — plain mul + add (no FMA) — and verified against a 17³ = 4913-point inline parity grid plus an exhaustive 256³ = 16,777,216-point sweep (#[ignore]-gated; run via cargo test --release --ignored).

CIEDE2000 is scalar-only by design — its atan2 / sin / cos / exp and branchy hue-wraparound logic don’t vectorize cleanly; an attempt regressed by ~35% vs the scalar baseline.

§Codegen pipeline

colorthief-dataset/src/generated.rs is produced offline by cargo run --release -p xtask -- codegen. The xtask:

  1. Parses colorthief-dataset/assets/color_hierarchy.csv (sourced from Stitch Fix’s colornamer, Apache-2.0).
  2. Computes CIE LAB (D65, 2°) per entry.
  3. Computes the 32³ CIEDE2000 candidate-set LUT (rayon-parallel, ~3 min on Apple Silicon — every u8 RGB swept through the full-scan reference).
  4. Emits two #[non_exhaustive] #[repr(u8)] enums (Family, Kind) covering every distinct value in the CSV.
  5. Pretty-prints + rustfmts the result so it passes cargo fmt --check.

CI’s codegen-up-to-date job re-runs the xtask and fails if generated.rs would change — guarantees no drift between assets/ and the committed source.

§Coverage-side cfgs

For coverage runs that need to exercise lower-tier SIMD branches on hardware that natively supports a higher tier:

  • --cfg colorthief_force_scalar — bypass every SIMD backend.
  • --cfg colorthief_disable_avx512 — drop x86_64 from AVX-512F to AVX2.
  • --cfg colorthief_disable_avx2 — drop x86_64 to SSE4.1.

These flags are also exercised by the simd.yml CI workflow.

§License

colorthief is dual-licensed under MIT or Apache-2.0 at your option.

See LICENSE-APACHE, LICENSE-MIT for details.

The upstream xkcd color-survey data is public domain (Randall Munroe); Stitch Fix’s hierarchical name layers are Apache-2.0 (attribution in THIRD_PARTY_NOTICES.md).

Copyright (c) 2026 FinDIT Studio authors.

Structs§

Color
One named entry in the xkcd color hierarchy.
Dominant
One entry in an extract result: the actual MMCQ-extracted RGB, the closest xkcd-hierarchy Color for naming, and the pixel-count weight behind that color.
Mmcq
MMCQ workspace. Holds the 32K-entry histogram and the box queue inline as fixed-size arrays — no heap allocations for either.
Rgb48Frame
A validated borrow over a packed sRGB 16-bit-per-channel frame.
RgbFrame
A validated borrow over a packed sRGB 8-bit frame.

Enums§

Algorithm
Color-difference algorithm used to map an arbitrary RGB query to its nearest Color in the xkcd palette.
Family
Color family classification sourced from the upstream color_hierarchy.csv color_family column. Marked #[non_exhaustive] so adding a new upstream value is a non-breaking change for downstream consumers; call Family::as_str to get the original string back when you need to feed it into a search index.
Kind
Color kind / texture classification sourced from the upstream color_hierarchy.csv color_type column. Marked #[non_exhaustive] so adding a new upstream value is a non-breaking change for downstream consumers; call Kind::as_str to get the original string back when you need to feed it into a search index.
RgbFrameError
Errors returned by RgbFrame::try_new.

Traits§

Buffer
Push-shaped output buffer.

Functions§

extractalloc or std
Extract up to count dominant colors from frame, each mapped to its nearest entry in the xkcd color hierarchy and weighted by the number of source pixels behind it.
extract_rgb48alloc or std
16-bit-per-channel variant of extract for HDR sources. Mirrors the colconv MixedSinker::with_rgb_u16 output buffer. Each u16 channel is downscaled to u8 via >> 8 at pixel iteration before MMCQ — see Rgb48Frame for the rationale and color-space caveat.
extract_rgb48_withalloc or std
16-bit variant of extract_with.
extract_withalloc or std
Same as extract but the per-dominant naming step uses the algorithm specified by algo. See Algorithm for the variants and their speed/accuracy trade-offs.