[−][src]Struct linfa_clustering::KMeans

pub struct KMeans<F: Float> { /* fields omitted */ }

K-means clustering aims to partition a set of unlabeled observations into clusters, where each observation belongs to the cluster with the nearest mean.

The mean of the points within a cluster is called centroid.

Given the set of centroids, you can assign an observation to a cluster choosing the nearest centroid.

We provide an implementation of the standard algorithm, also known as Lloyd's algorithm or naive K-means.

More details on the algorithm can be found in the next section or here.

The algorithm

K-means is an iterative algorithm: it progressively refines the choice of centroids.

It's guaranteed to converge, even though it might not find the optimal set of centroids (unfortunately it can get stuck in a local minimum, finding the optimal minimum if NP-hard!).

There are three steps in the standard algorithm:

initialisation step: how do we choose our initial set of centroids?
assignment step: assign each observation to the nearest cluster (minimum distance between the observation and the cluster's centroid);
update step: recompute the centroid of each cluster.

The initialisation step is a one-off, done at the very beginning. Assignment and update are repeated in a loop until convergence is reached (either the euclidean distance between the old and the new clusters is below tolerance or we exceed the max_n_iterations).

Parallelisation

The work performed by the assignment step does not require any coordination: the closest centroid for each point can be computed independently from the closest centroid for any of the remaining points.

This makes it a good candidate for parallel execution: KMeans::fit parallelises the assignment step thanks to the rayon feature in ndarray.

The update step requires a bit more coordination (computing a rolling mean in parallel) but it is still parallelisable. Nonetheless, our first attempts have not improved performance (most likely due to our strategy used to split work between threads), hence the update step is currently executed on a single thread.

Tutorial

Let's do a walkthrough of a training-predict-save example.

use linfa::DatasetBase;
use linfa::traits::{Fit, Predict};
use linfa_clustering::{KMeansHyperParams, KMeans, generate_blobs};
use ndarray::{Axis, array, s};
use ndarray_rand::rand::SeedableRng;
use rand_isaac::Isaac64Rng;
use approx::assert_abs_diff_eq;

// Our random number generator, seeded for reproducibility
let seed = 42;
let mut rng = Isaac64Rng::seed_from_u64(seed);

// `expected_centroids` has shape `(n_centroids, n_features)`
// i.e. three points in the 2-dimensional plane
let expected_centroids = array![[0., 1.], [-10., 20.], [-1., 10.]];
// Let's generate a synthetic dataset: three blobs of observations
// (100 points each) centered around our `expected_centroids`
let observations = DatasetBase::from(generate_blobs(100, &expected_centroids, &mut rng));

// Let's configure and run our K-means algorithm
// We use the builder pattern to specify the hyperparameters
// `n_clusters` is the only mandatory parameter.
// If you don't specify the others (e.g. `n_runs`, `tolerance`, `max_n_iterations`)
// default values will be used.
let n_clusters = expected_centroids.len_of(Axis(0));
let model = KMeans::params(n_clusters)
    .tolerance(1e-2)
    .fit(&observations)
    .expect("KMeans fitted");

// Once we found our set of centroids, we can also assign new points to the nearest cluster
let new_observation = DatasetBase::from(array![[-9., 20.5]]);
// Predict returns the **index** of the nearest cluster
let dataset = model.predict(new_observation);
// We can retrieve the actual centroid of the closest cluster using `.centroids()`
let closest_centroid = &model.centroids().index_axis(Axis(0), dataset.targets()[0]);

Implementations

`impl<F: Float> KMeans<F>`[src]

`pub fn params(nclusters: usize) -> KMeansHyperParamsBuilder<F, Isaac64Rng>`[src]

`pub fn params_with_rng<R: Rng + Clone>( nclusters: usize, rng: R ) -> KMeansHyperParamsBuilder<F, R>`[src]

`pub fn centroids(&self) -> &Array2<F>`[src]

Return the set of centroids as a 2-dimensional matrix with shape (n_centroids, n_features).

Trait Implementations

`impl<F: Clone + Float> Clone for KMeans<F>`[src]

`pub fn clone(&self) -> KMeans<F>`[src]

`pub fn clone_from(&mut self, source: &Self)`1.0.0[src]

`impl<F: Debug + Float> Debug for KMeans<F>`[src]

`pub fn fmt(&self, f: &mut Formatter<'_>) -> Result`[src]

`impl<F: PartialEq + Float> PartialEq<KMeans<F>> for KMeans<F>`[src]

`pub fn eq(&self, other: &KMeans<F>) -> bool`[src]

`pub fn ne(&self, other: &KMeans<F>) -> bool`[src]

`impl<F: Float, D: Data<Elem = F>> Predict<&'_ ArrayBase<D, Dim<[usize; 2]>>, ArrayBase<OwnedRepr<usize>, Dim<[usize; 1]>>> for KMeans<F>`[src]

`pub fn predict(&self, observations: &ArrayBase<D, Ix2>) -> Array1<usize>`[src]

Given an input matrix observations, with shape (n_observations, n_features), predict returns, for each observation, the index of the closest cluster/centroid.

You can retrieve the centroid associated to an index using the centroids method.

`impl<F: Float, D: Data<Elem = F>, T: Targets> Predict<DatasetBase<ArrayBase<D, Dim<[usize; 2]>>, T>, DatasetBase<ArrayBase<D, Dim<[usize; 2]>>, ArrayBase<OwnedRepr<usize>, Dim<[usize; 1]>>>> for KMeans<F>`[src]

`pub fn predict( &self, dataset: DatasetBase<ArrayBase<D, Ix2>, T> ) -> DatasetBase<ArrayBase<D, Ix2>, Array1<usize>>`[src]

`impl<F: Float> StructuralPartialEq for KMeans<F>`[src]

Auto Trait Implementations

`impl<F> RefUnwindSafe for KMeans<F> where F: RefUnwindSafe,` [src]

`impl<F> Send for KMeans<F>`[src]

`impl<F> Sync for KMeans<F>`[src]

`impl<F> Unpin for KMeans<F>`[src]

`impl<F> UnwindSafe for KMeans<F> where F: RefUnwindSafe,` [src]

Blanket Implementations

`impl<T> Any for T where T: 'static + ?Sized,` [src]

`pub fn type_id(&self) -> TypeId`[src]

`impl<T> Borrow<T> for T where T: ?Sized,` [src]

`pub fn borrow(&self) -> &T`[src]

`impl<T> BorrowMut<T> for T where T: ?Sized,` [src]

`pub fn borrow_mut(&mut self) -> &mut T`[src]

`impl<T> From<T> for T`[src]

`pub fn from(t: T) -> T`[src]

`impl<T, U> Into for T where U: From<T>,` [src]

`pub fn into(self) -> U`[src]

`impl<T> Pointable for T`

`pub const ALIGN: usize`

`type Init = T`

The type for initializers.

`pub unsafe fn init(init: <T as Pointable>::Init) -> usize`

`pub unsafe fn deref<'a>(ptr: usize) -> &'a T`

`pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T`

`pub unsafe fn drop(ptr: usize)`

`impl<SS, SP> SupersetOf<SS> for SP where SS: SubsetOf<SP>,`

`pub fn to_subset(&self) -> Option<SS>`

`pub fn is_in_subset(&self) -> bool`

`pub unsafe fn to_subset_unchecked(&self) -> SS`

`pub fn from_subset(element: &SS) -> SP`

`impl<T> ToOwned for T where T: Clone,` [src]

`type Owned = T`

The resulting type after obtaining ownership.

`pub fn to_owned(&self) -> T`[src]

`pub fn clone_into(&self, target: &mut T)`[src]

`impl<T, U> TryFrom for T where U: Into<T>,` [src]

`type Error = Infallible`

The type returned in the event of a conversion error.

`pub fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>`[src]

`impl<T, U> TryInto for T where U: TryFrom<T>,` [src]

`type Error = >::Error`

The type returned in the event of a conversion error.

[−][src]Struct linfa_clustering::KMeans

The algorithm

Parallelisation

Tutorial

Implementations

impl<F: Float> KMeans<F>[src]

pub fn params(nclusters: usize) -> KMeansHyperParamsBuilder<F, Isaac64Rng>[src]

pub fn params_with_rng<R: Rng + Clone>( nclusters: usize, rng: R) -> KMeansHyperParamsBuilder<F, R>[src]

pub fn centroids(&self) -> &Array2<F>[src]

Trait Implementations

impl<F: Clone + Float> Clone for KMeans<F>[src]

pub fn clone(&self) -> KMeans<F>[src]

pub fn clone_from(&mut self, source: &Self)1.0.0[src]

impl<F: Debug + Float> Debug for KMeans<F>[src]

pub fn fmt(&self, f: &mut Formatter<'_>) -> Result[src]

impl<F: PartialEq + Float> PartialEq<KMeans<F>> for KMeans<F>[src]

pub fn eq(&self, other: &KMeans<F>) -> bool[src]

pub fn ne(&self, other: &KMeans<F>) -> bool[src]

impl<F: Float, D: Data<Elem = F>> Predict<&'_ ArrayBase<D, Dim<[usize; 2]>>, ArrayBase<OwnedRepr<usize>, Dim<[usize; 1]>>> for KMeans<F>[src]

pub fn predict(&self, observations: &ArrayBase<D, Ix2>) -> Array1<usize>[src]

impl<F: Float, D: Data<Elem = F>, T: Targets> Predict<DatasetBase<ArrayBase<D, Dim<[usize; 2]>>, T>, DatasetBase<ArrayBase<D, Dim<[usize; 2]>>, ArrayBase<OwnedRepr<usize>, Dim<[usize; 1]>>>> for KMeans<F>[src]

pub fn predict( &self, dataset: DatasetBase<ArrayBase<D, Ix2>, T>) -> DatasetBase<ArrayBase<D, Ix2>, Array1<usize>>[src]

impl<F: Float> StructuralPartialEq for KMeans<F>[src]

Auto Trait Implementations

impl<F> RefUnwindSafe for KMeans<F> where F: RefUnwindSafe, [src]

impl<F> Send for KMeans<F>[src]

impl<F> Sync for KMeans<F>[src]

impl<F> Unpin for KMeans<F>[src]

impl<F> UnwindSafe for KMeans<F> where F: RefUnwindSafe, [src]

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized, [src]

pub fn type_id(&self) -> TypeId[src]

impl<T> Borrow<T> for T where T: ?Sized, [src]

pub fn borrow(&self) -> &T[src]

impl<T> BorrowMut<T> for T where T: ?Sized, [src]

pub fn borrow_mut(&mut self) -> &mut T[src]

impl<T> From<T> for T[src]

pub fn from(t: T) -> T[src]

impl<T, U> Into<U> for T where U: From<T>, [src]

pub fn into(self) -> U[src]

impl<T> Pointable for T

pub const ALIGN: usize

type Init = T

pub unsafe fn init(init: <T as Pointable>::Init) -> usize

pub unsafe fn deref<'a>(ptr: usize) -> &'a T

pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

pub unsafe fn drop(ptr: usize)

impl<SS, SP> SupersetOf<SS> for SP where SS: SubsetOf<SP>,

pub fn to_subset(&self) -> Option<SS>

pub fn is_in_subset(&self) -> bool

pub unsafe fn to_subset_unchecked(&self) -> SS

pub fn from_subset(element: &SS) -> SP

impl<T> ToOwned for T where T: Clone, [src]

type Owned = T

pub fn to_owned(&self) -> T[src]

pub fn clone_into(&self, target: &mut T)[src]

impl<T, U> TryFrom<U> for T where U: Into<T>, [src]

type Error = Infallible

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>[src]

impl<T, U> TryInto<U> for T where U: TryFrom<T>, [src]

type Error = <U as TryFrom<T>>::Error

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>[src]

impl<V, T> VZip<V> for T where V: MultiLane<T>,

pub fn vzip(self) -> V