[][src]Module smartcore::cluster::kmeans

An iterative clustering algorithm that aims to find local maxima in each iteration.

K-Means Clustering

K-means clustering partitions data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart. Similarity of two points is determined by the Euclidian Distance between them.

K-means algorithm is not capable of determining the number of clusters. You need to choose this number yourself. One way to choose optimal number of clusters is to use Elbow Method.

At the high level K-Means algorithm works as follows. K data points are randomly chosen from a given dataset as cluster centers (centroids) and all training instances are added to the closest cluster. After that the centroids, representing the mean of the instances of each cluster are re-calculated and these re-calculated centroids becoming the new centers of their respective clusters. Next all instances of the training set are re-assigned to their closest cluster again. This iterative process continues until convergence is achieved and the clusters are considered settled.

Initial choice of K data points is very important and has big effect on performance of the algorithm. SmartCore uses k-means++ algorithm to initialize cluster centers.

Example:

use smartcore::linalg::naive::dense_matrix::*;
use smartcore::cluster::kmeans::*;

// Iris data
let x = DenseMatrix::from_2d_array(&[
           &[5.1, 3.5, 1.4, 0.2],
           &[4.9, 3.0, 1.4, 0.2],
           &[4.7, 3.2, 1.3, 0.2],
           &[4.6, 3.1, 1.5, 0.2],
           &[5.0, 3.6, 1.4, 0.2],
           &[5.4, 3.9, 1.7, 0.4],
           &[4.6, 3.4, 1.4, 0.3],
           &[5.0, 3.4, 1.5, 0.2],
           &[4.4, 2.9, 1.4, 0.2],
           &[4.9, 3.1, 1.5, 0.1],
           &[7.0, 3.2, 4.7, 1.4],
           &[6.4, 3.2, 4.5, 1.5],
           &[6.9, 3.1, 4.9, 1.5],
           &[5.5, 2.3, 4.0, 1.3],
           &[6.5, 2.8, 4.6, 1.5],
           &[5.7, 2.8, 4.5, 1.3],
           &[6.3, 3.3, 4.7, 1.6],
           &[4.9, 2.4, 3.3, 1.0],
           &[6.6, 2.9, 4.6, 1.3],
           &[5.2, 2.7, 3.9, 1.4],
           ]);

let kmeans = KMeans::fit(&x, KMeansParameters::default().with_k(2)).unwrap(); // Fit to data, 2 clusters
let y_hat = kmeans.predict(&x).unwrap(); // use the same points for prediction

References:

Structs

KMeans

K-Means clustering algorithm

KMeansParameters

K-Means clustering algorithm parameters