Crate colorthief

Expand description

Color Thief

Dominant colors with human-vocabulary names for video keyframes — MMCQ extraction + nearest-neighbor lookup against the xkcd color survey.

LoC

§Overview

colorthief extracts dominant colors from packed-RGB video keyframes and maps each to its closest entry in a 949-color human-vocabulary table sourced from the xkcd color survey. Built for video indexing and search-vocabulary pipelines: every output dominant carries both the actual MMCQ-extracted RGB (for swatch rendering) and the named Color (for search-index vocabulary), sorted descending by population.

§Crates in this workspace

Crate	Purpose
`colorthief`	Dominant-color extraction (MMCQ) + naming pipeline. `RgbFrame<'a>` (8-bit) / `Rgb48Frame<'a>` (16-bit HDR) input.
`colorthief-dataset`	Static xkcd palette + nearest-neighbor lookup with three color-difference metrics (CIEDE2000, CIE94, Delta E 76). `no_std + no_alloc`.
`xtask`	Build-time codegen — re-runs offline to regenerate the static dataset and CIEDE2000 LUT from the upstream CSV. Not published.

§Installation

[dependencies]
colorthief = "0.1"

# Or, if you only need the static palette + nearest-neighbor lookup
# (no MMCQ; works in no_std + no_alloc):
colorthief-dataset = "0.1"

Minimum supported Rust version: 1.95 (required for stable AVX-512F intrinsics and core::error::Error in no_std builds via thiserror 2 without its std feature).

§Examples

Example	Crate	Run
`extract`	`colorthief`	`cargo run --release --example extract -p colorthief`
`extract_rgb48` (HDR / 16-bit)	`colorthief`	`cargo run --release --example extract_rgb48 -p colorthief`
`extract_no_alloc` (`static mut Mmcq` + fixed buffer)	`colorthief`	`cargo run --release --example extract_no_alloc -p colorthief`
`lookup` (name-only, no MMCQ)	`colorthief-dataset`	`cargo run --release --example lookup -p colorthief-dataset`

See more details in examples and examples.

§Algorithms

Three nearest-neighbor metrics, behind a #[non_exhaustive] #[repr(u8)] enum:

`Algorithm`	Speed (NEON)	Notes
`Ciede2000Exact` (default)	~230 ns/query (LUT) or 71.5 µs (full scan)	Modern perceptual gold-standard. Provably exact at u8 RGB resolution when `lut` feature is on.
`Cie94`	~510 ns/query	Asymmetric (palette = reference). Mid-accuracy.
`DeltaE76`	~470 ns/query	Squared Euclidean LAB. Fastest, but well-known biases in the saturated blue / yellow regions.

The default Ciede2000Exact is ~310× faster than naive full-scan thanks to a pre-computed 32³ candidate-set LUT (see Architecture below).

§Feature flags

colorthief:

Feature	Default	Effect
`std`	✓	`thread_local!`-cached MMCQ workspace; zero-alloc-per-call after first call per thread. Implies `alloc`.
`alloc`		Heap allocator available; enables `Vec<Dominant>`-returning APIs and `Mmcq::new_boxed()`.
`lut`	✓	32³ candidate-set LUT for CIEDE2000 — ~256 KB binary cost, ~310× CIEDE2000 speedup.

colorthief-dataset:

Feature	Default	Effect
`std`	✓	Enables x86_64 runtime CPU-feature detection.
`alloc`		Forward-compat hook (current API is `no_alloc`).
`lut`	✓	The 32³ CIEDE2000 LUT — propagated from `colorthief/lut`.

§No-std + no-alloc support

Both crates are usable in no_std + no_alloc environments. Caller manages the MMCQ workspace (a static mut Mmcq placed in .bss) and the output buffer (a fixed-size [Option<Dominant>; N]). See the extract_no_alloc example for the full pattern.

The Buffer<T> trait abstracts the output: Vec<T> (alloc-gated), [Option<T>; N], &mut [Option<T>] ship by default; consumers can plug in arrayvec::ArrayVec / heapless::Vec / custom types with a one-line impl Buffer<T>.

For zero-alloc-per-call in single-threaded no_std + alloc environments (typical wasm32-unknown-unknown / interrupt-free bare metal), place an Mmcq in static mut yourself — the unsafe then sits at your call site, not silently inside this crate.

§SIMD backends

Color::nearest_to (Delta E 76) and Color::nearest_to_cie94 dispatch to per-arch SIMD backends:

Backend	ISA	Lanes	Detection
`aarch64_neon`	NEON	4 (128-bit)	compile-time (`target_feature = "neon"`)
`x86_avx512`	AVX-512F	16 (512-bit)	runtime (`is_x86_feature_detected!`)
`x86_avx2`	AVX2	8 (256-bit)	runtime
`x86_sse41`	SSE4.1	4 (128-bit)	runtime
`wasm_simd128`	SIMD128	4 (128-bit)	compile-time (`target_feature = "simd128"`)
`scalar`	—	1	always available

Every backend is bit-identical to the scalar reference — plain mul + add (no FMA) — and verified against a 17³ = 4913-point inline parity grid plus an exhaustive 256³ = 16,777,216-point sweep (#[ignore]-gated; run via cargo test --release --ignored).

CIEDE2000 is scalar-only by design — its atan2 / sin / cos / exp and branchy hue-wraparound logic don’t vectorize cleanly; an attempt regressed by ~35% vs the scalar baseline.

§Codegen pipeline

colorthief-dataset/src/generated.rs is produced offline by cargo run --release -p xtask -- codegen. The xtask:

Parses colorthief-dataset/assets/color_hierarchy.csv (sourced from Stitch Fix’s colornamer, Apache-2.0).
Computes CIE LAB (D65, 2°) per entry.
Computes the 32³ CIEDE2000 candidate-set LUT (rayon-parallel, ~3 min on Apple Silicon — every u8 RGB swept through the full-scan reference).
Emits two #[non_exhaustive] #[repr(u8)] enums (Family, Kind) covering every distinct value in the CSV.
Pretty-prints + rustfmts the result so it passes cargo fmt --check.

CI’s codegen-up-to-date job re-runs the xtask and fails if generated.rs would change — guarantees no drift between assets/ and the committed source.

§Coverage-side cfgs

For coverage runs that need to exercise lower-tier SIMD branches on hardware that natively supports a higher tier:

--cfg colorthief_force_scalar — bypass every SIMD backend.
--cfg colorthief_disable_avx512 — drop x86_64 from AVX-512F to AVX2.
--cfg colorthief_disable_avx2 — drop x86_64 to SSE4.1.

These flags are also exercised by the simd.yml CI workflow.

§License

colorthief is dual-licensed under MIT or Apache-2.0 at your option.

See LICENSE-APACHE, LICENSE-MIT for details.

The upstream xkcd color-survey data is public domain (Randall Munroe); Stitch Fix’s hierarchical name layers are Apache-2.0 (attribution in THIRD_PARTY_NOTICES.md).

Structs§

Color: One named entry in the xkcd color hierarchy.
Dominant: One entry in an extract result: the actual MMCQ-extracted RGB, the closest xkcd-hierarchy Color for naming, and the pixel-count weight behind that color.
Mmcq: MMCQ workspace. Holds the 32K-entry histogram and the box queue inline as fixed-size arrays — no heap allocations for either.
Rgb48Frame: A validated borrow over a packed sRGB 16-bit-per-channel frame.
RgbFrame: A validated borrow over a packed sRGB 8-bit frame.

Enums§

Algorithm: Color-difference algorithm used to map an arbitrary RGB query to its nearest Color in the xkcd palette.
Family: Color family classification sourced from the upstream color_hierarchy.csv color_family column. Marked #[non_exhaustive] so adding a new upstream value is a non-breaking change for downstream consumers; call Family::as_str to get the original string back when you need to feed it into a search index.
Kind: Color kind / texture classification sourced from the upstream color_hierarchy.csv color_type column. Marked #[non_exhaustive] so adding a new upstream value is a non-breaking change for downstream consumers; call Kind::as_str to get the original string back when you need to feed it into a search index.
RgbFrameError: Errors returned by RgbFrame::try_new.

Traits§

Buffer: Push-shaped output buffer.

Functions§

extractalloc or std: Extract up to count dominant colors from frame, each mapped to its nearest entry in the xkcd color hierarchy and weighted by the number of source pixels behind it.
extract_rgb48alloc or std: 16-bit-per-channel variant of extract for HDR sources. Mirrors the colconv MixedSinker::with_rgb_u16 output buffer. Each u16 channel is downscaled to u8 via >> 8 at pixel iteration before MMCQ — see Rgb48Frame for the rationale and color-space caveat.
extract_rgb48_withalloc or std: 16-bit variant of extract_with.
extract_withalloc or std: Same as extract but the per-dominant naming step uses the algorithm specified by algo. See Algorithm for the variants and their speed/accuracy trade-offs.