Crate kmeans_colors

Expand description

Calculate the k-means of a set of data.

Overview

This crate provides traits for implementing and calculating a k-means clustering algorithm. The original implementation of this library was created for finding k-means colors in image buffers. Applications of crate functionality can be seen on the README page.

When using the library, set default-features = false in the Cargo.toml to avoid bringing in the binary dependencies. If working with colors, implementations have been provided for the palette Lab and Rgb color types behind the palette_color feature.

The binary located in src/bin/kmeans_colors shows examples of crate usage.

The `Calculate` trait

k-means calculations can be provided for other data types by implementing the Calculate trait. Further, Hamerly can be implemented to enable use of the Hamerly optimization and get_kmeans_hamerly. See the Lab and Rgb implementations in colors/kmeans.rs for examples. These implementations can be used as groundwork for implementing with other types and should not require much modification beyond the distance calculations.

Calculating k-means with `palette_color`

The palette_color feature provides implementations of the Calculate trait for the Lab color space and Rgb color space. Each space has advantages and drawbacks due to the characteristics of the color space.

The Lab calculation produces more perceptually accurate results at a slightly slower runtime. Rgb calculation will converge faster than Lab but the results may not visually correlate as well to the original image. Overall, properly converged results should not differ that drastically except at lower k counts. At k=1, the average color of an image, results should match almost exactly.

Note: If k-means calculation is taking too long, try scaling down the image size. A full-size image is not required for calculating the color palette or dominant color.

Calculating k-means

A basic workflow consists of reading a pixel buffer in, converting it into a flat array, then using that array with the k-means functions. The following example converts an array of u8 into Lab colors then finds the k-means.

use palette::cast::{from_component_slice, into_component_slice};
use palette::{FromColor, IntoColor, Lab, Srgb};
use kmeans_colors::{get_kmeans, Calculate, Kmeans, MapColor, Sort};

// An image buffer of one black pixel and one white pixel
let img_vec = [0u8, 0, 0, 255, 255, 255];

// Convert RGB [u8] buffer to Lab for k-means
let lab: Vec<Lab> = from_component_slice::<Srgb<u8>>(&img_vec)
    .iter()
    .map(|x| x.into_format().into_color())
    .collect();

// Iterate over the runs, keep the best results
let mut result = Kmeans::new();
for i in 0..runs {
    let run_result = get_kmeans(
        k,
        max_iter,
        converge,
        verbose,
        &lab,
        seed + i as u64,
    );
    if run_result.score < result.score {
        result = run_result;
    }
}

// Convert indexed colors back to Srgb<u8> for output
let rgb = &result.centroids
    .iter()
    .map(|x| Srgb::from_color(*x).into_format())
    .collect::<Vec<Srgb<u8>>>();
let buffer = Srgb::map_indices_to_centroids(&rgb, &result.indices);

k-means++ is used for centroid initialization. Because the initialization is random, the k-means calculation may be run multiple times to assure that the best result has been found. The algorithm can find itself in a sub-optimal result due to initial centroids, however, one run may suffice if the convergence threshold has been met.

The binary uses 8 as the default k. The iteration limit is set to 20. The convergence factor defaults to 5.0 for Lab and 0.0025 for Rgb. The number of runs defaults to 3 for one of the binary subcommands. If the results do not appear correct, raise the iteration limit as convergence was probably not met.

Getting the dominant color

After k-means calculation, the dominant color can be found by sorting the results and taking the centroid of the first item. The sort_indexed_colors function sorts the colors from darkest to lightest and returns an array of CentroidData.

use kmeans_colors::Sort;

// Using the results from the previous example, process the centroid data
let mut res = Lab::sort_indexed_colors(&result.centroids, &result.indices);

// We can find the dominant color directly
let dominant_color = Lab::get_dominant_color(&res);

// Or we can manually sort the vec by percentage, and the most appearing
// color will be the first element
res.sort_unstable_by(|a, b| (b.percentage).total_cmp(&a.percentage));
let dominant_color = res.first().unwrap().centroid;

Structs

CentroidData
Struct containing a centroid, its percentage within a buffer, and the centroid’s index.
HamerlyCentroids
Struct used for caching data required to compute k-means with the Hamerly algorithm.
HamerlyPoint
Struct that holds the necessary caching information for points in the Hamerly algorithm implementation.
Kmeans
Struct result of k-means calculation with convergence score, centroids, and indexed buffer.

Traits

Calculate
A trait for enabling k-means calculation of a data type.
Hamerly
A trait for calculating k-means with the Hamerly algorithm.
MapColor
A trait for mapping colors to their corresponding centroids.
Sort
A trait for sorting indexed k-means colors.

Functions

get_kmeans
Find the k-means centroids of a buffer.
get_kmeans_hamerly
Find the k-means centroids of a buffer using the Hamerly algorithm. Takes the same arguments as get_kmeans and produces the same results.
init_plus_plus
k-means++ centroid initialization.