Crate kmeans_smid

Source
Expand description

§kmeans_smid

Current Crates.io Version docs

kmeans_smid is a small and fast library for k-means clustering calculations. It fixes smid problem from kmeans crate. Here is a small example, using kmean++ as initialization method and lloyd as k-means variant:

use kmeans_smid::*;

fn main() {
    let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);

    // Generate some random data
    let mut samples = vec![0.0f64;sample_cnt * sample_dims];
    samples.iter_mut().for_each(|v| *v = rand::random());

    // Calculate kmeans, using kmean++ as initialization-method
    let kmean = KMeans<f64, 8>::new(samples, sample_cnt, sample_dims);
    let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());

    println!("Centroids: {:?}", result.centroids);
    println!("Cluster-Assignments: {:?}", result.assignments);
    println!("Error: {}", result.distsum);
}

§Datastructures

For performance-reasons, all calculations are done on bare vectors, using hand-written SIMD intrinsics from the packed_simd crate. All vectors are stored row-major, so each sample is stored in a consecutive block of memory.

§Supported variants / algorithms

  • lloyd (standard kmeans)
  • minibatch

§Supported centroid initialization methods

  • KMean++
  • random partition
  • random sample

Structs§

KMeans
Entrypoint of this crate’s API-Surface.
KMeansConfig
This is a structure holding various configuration options for the a k-means calculations, such as the random number generator to use, or a couple of callbacks, that can be set to get status information from a running k-means calculation.
KMeansConfigBuilder
KMeansState
This is the internally used data-structure, storing the current state during calculation, as well as the final result, as returned by the API. All mutations are done in this structure, making KMeans immutable, and therefore allowing it to be used in parallel, without having to duplicate the input-data.

Enums§

AbortStrategy
Enum with possible abort strategies. These strategies specify when a running iteration (with the k-means calculation) is aborted.

Traits§

Primitive