Expand description
Color Thief
Dominant colors with human-vocabulary names for video keyframes — MMCQ extraction + nearest-neighbor lookup against the xkcd color survey.
§Overview
colorthief extracts dominant colors from packed-RGB video keyframes
and maps each to its closest entry in a 949-color human-vocabulary
table sourced from the xkcd color survey. Built for video
indexing and search-vocabulary pipelines: every output dominant
carries both the actual MMCQ-extracted RGB (for swatch rendering)
and the named Color (for search-index vocabulary), sorted
descending by population.
§Crates in this workspace
| Crate | Purpose |
|---|---|
colorthief | Dominant-color extraction (MMCQ) + naming pipeline. RgbFrame<'a> (8-bit) / Rgb48Frame<'a> (16-bit HDR) input. |
colorthief-dataset | Static xkcd palette + nearest-neighbor lookup with three color-difference metrics (CIEDE2000, CIE94, Delta E 76). no_std + no_alloc. |
xtask | Build-time codegen — re-runs offline to regenerate the static dataset and CIEDE2000 LUT from the upstream CSV. Not published. |
§Installation
[dependencies]
colorthief = "0.1"
# Or, if you only need the static palette + nearest-neighbor lookup
# (no MMCQ; works in no_std + no_alloc):
colorthief-dataset = "0.1"Minimum supported Rust version: 1.95 (required for stable AVX-512F
intrinsics and core::error::Error in no_std builds via
thiserror 2 without its std feature).
§Examples
| Example | Crate | Run |
|---|---|---|
extract | colorthief | cargo run --release --example extract -p colorthief |
extract_rgb48 (HDR / 16-bit) | colorthief | cargo run --release --example extract_rgb48 -p colorthief |
extract_no_alloc (static mut Mmcq + fixed buffer) | colorthief | cargo run --release --example extract_no_alloc -p colorthief |
lookup (name-only, no MMCQ) | colorthief-dataset | cargo run --release --example lookup -p colorthief-dataset |
See more details in examples and examples.
§Algorithms
Three nearest-neighbor metrics, behind a #[non_exhaustive] #[repr(u8)] enum:
Algorithm | Speed (NEON) | Notes |
|---|---|---|
Ciede2000Exact (default) | ~230 ns/query (LUT) or 71.5 µs (full scan) | Modern perceptual gold-standard. Provably exact at u8 RGB resolution when lut feature is on. |
Cie94 | ~510 ns/query | Asymmetric (palette = reference). Mid-accuracy. |
DeltaE76 | ~470 ns/query | Squared Euclidean LAB. Fastest, but well-known biases in the saturated blue / yellow regions. |
The default Ciede2000Exact is ~310× faster than naive full-scan
thanks to a pre-computed 32³ candidate-set LUT (see Architecture
below).
§Feature flags
colorthief:
| Feature | Default | Effect |
|---|---|---|
std | ✓ | thread_local!-cached MMCQ workspace; zero-alloc-per-call after first call per thread. Implies alloc. |
alloc | Heap allocator available; enables Vec<Dominant>-returning APIs and Mmcq::new_boxed(). | |
lut | ✓ | 32³ candidate-set LUT for CIEDE2000 — ~256 KB binary cost, ~310× CIEDE2000 speedup. |
colorthief-dataset:
| Feature | Default | Effect |
|---|---|---|
std | ✓ | Enables x86_64 runtime CPU-feature detection. |
alloc | Forward-compat hook (current API is no_alloc). | |
lut | ✓ | The 32³ CIEDE2000 LUT — propagated from colorthief/lut. |
§No-std + no-alloc support
Both crates are usable in no_std + no_alloc environments. Caller
manages the MMCQ workspace (a static mut Mmcq placed in .bss) and
the output buffer (a fixed-size [Option<Dominant>; N]). See the
extract_no_alloc
example for the full pattern.
The Buffer<T> trait abstracts the output: Vec<T> (alloc-gated),
[Option<T>; N], &mut [Option<T>] ship by default; consumers can
plug in arrayvec::ArrayVec / heapless::Vec / custom types with
a one-line impl Buffer<T>.
For zero-alloc-per-call in single-threaded no_std + alloc
environments (typical wasm32-unknown-unknown / interrupt-free bare
metal), place an Mmcq in static mut yourself — the unsafe
then sits at your call site, not silently inside this crate.
§SIMD backends
Color::nearest_to (Delta E 76) and Color::nearest_to_cie94
dispatch to per-arch SIMD backends:
| Backend | ISA | Lanes | Detection |
|---|---|---|---|
aarch64_neon | NEON | 4 (128-bit) | compile-time (target_feature = "neon") |
x86_avx512 | AVX-512F | 16 (512-bit) | runtime (is_x86_feature_detected!) |
x86_avx2 | AVX2 | 8 (256-bit) | runtime |
x86_sse41 | SSE4.1 | 4 (128-bit) | runtime |
wasm_simd128 | SIMD128 | 4 (128-bit) | compile-time (target_feature = "simd128") |
scalar | — | 1 | always available |
Every backend is bit-identical to the scalar reference — plain
mul + add (no FMA) — and verified against a 17³ = 4913-point
inline parity grid plus an exhaustive 256³ = 16,777,216-point sweep
(#[ignore]-gated; run via cargo test --release --ignored).
CIEDE2000 is scalar-only by design — its atan2 / sin / cos /
exp and branchy hue-wraparound logic don’t vectorize cleanly; an
attempt regressed by ~35% vs the scalar baseline.
§Codegen pipeline
colorthief-dataset/src/generated.rs is produced offline by
cargo run --release -p xtask -- codegen. The xtask:
- Parses
colorthief-dataset/assets/color_hierarchy.csv(sourced from Stitch Fix’scolornamer, Apache-2.0). - Computes CIE LAB (D65, 2°) per entry.
- Computes the 32³ CIEDE2000 candidate-set LUT (rayon-parallel, ~3 min on Apple Silicon — every u8 RGB swept through the full-scan reference).
- Emits two
#[non_exhaustive] #[repr(u8)]enums (Family,Kind) covering every distinct value in the CSV. - Pretty-prints +
rustfmts the result so it passescargo fmt --check.
CI’s codegen-up-to-date job re-runs the xtask and fails if
generated.rs would change — guarantees no drift between assets/
and the committed source.
§Coverage-side cfgs
For coverage runs that need to exercise lower-tier SIMD branches on hardware that natively supports a higher tier:
--cfg colorthief_force_scalar— bypass every SIMD backend.--cfg colorthief_disable_avx512— drop x86_64 from AVX-512F to AVX2.--cfg colorthief_disable_avx2— drop x86_64 to SSE4.1.
These flags are also exercised by the simd.yml CI workflow.
§License
colorthief is dual-licensed under MIT or Apache-2.0 at your
option.
See LICENSE-APACHE, LICENSE-MIT for details.
The upstream xkcd color-survey data is public domain (Randall
Munroe); Stitch Fix’s hierarchical name layers are Apache-2.0
(attribution in THIRD_PARTY_NOTICES.md).
Copyright (c) 2026 FinDIT Studio authors.
Structs§
- Color
- One named entry in the xkcd color hierarchy.
- Dominant
- One entry in an
extractresult: the actual MMCQ-extracted RGB, the closest xkcd-hierarchyColorfor naming, and the pixel-count weight behind that color. - Mmcq
- MMCQ workspace. Holds the 32K-entry histogram and the box queue inline as fixed-size arrays — no heap allocations for either.
- Rgb48
Frame - A validated borrow over a packed sRGB 16-bit-per-channel frame.
- RgbFrame
- A validated borrow over a packed sRGB 8-bit frame.
Enums§
- Algorithm
- Color-difference algorithm used to map an arbitrary RGB query to
its nearest
Colorin the xkcd palette. - Family
- Color family classification sourced from the upstream
color_hierarchy.csvcolor_familycolumn. Marked#[non_exhaustive]so adding a new upstream value is a non-breaking change for downstream consumers; callFamily::as_strto get the original string back when you need to feed it into a search index. - Kind
- Color kind / texture classification sourced from the upstream
color_hierarchy.csvcolor_typecolumn. Marked#[non_exhaustive]so adding a new upstream value is a non-breaking change for downstream consumers; callKind::as_strto get the original string back when you need to feed it into a search index. - RgbFrame
Error - Errors returned by
RgbFrame::try_new.
Traits§
- Buffer
- Push-shaped output buffer.
Functions§
- extract
allocorstd - Extract up to
countdominant colors fromframe, each mapped to its nearest entry in the xkcd color hierarchy and weighted by the number of source pixels behind it. - extract_
rgb48 allocorstd - 16-bit-per-channel variant of
extractfor HDR sources. Mirrors the colconvMixedSinker::with_rgb_u16output buffer. Each u16 channel is downscaled to u8 via>> 8at pixel iteration before MMCQ — seeRgb48Framefor the rationale and color-space caveat. - extract_
rgb48_ with allocorstd - 16-bit variant of
extract_with. - extract_
with allocorstd - Same as
extractbut the per-dominant naming step uses the algorithm specified byalgo. SeeAlgorithmfor the variants and their speed/accuracy trade-offs.