pub struct KMeans<T>{ /* private fields */ }
Expand description
Entrypoint of this crate’s API-Surface.
Create an instance of this struct, giving the samples you want to operate on. The primitive type
of the passed samples array will be the type used internaly for all calculations, as well as the result
as stored in the returned KMeansState
structure.
Supported variants
- k-Means clustering (Lloyd)
KMeans::kmeans_lloyd
- Mini-Batch k-Means clustering
KMeans::kmeans_minibatch
Supported initialization methods
- K-Mean++
KMeans::init_kmeanplusplus
- Random-Sample
KMeans::init_random_sample
- Random-Partition
KMeans::init_random_partition
Implementations§
source§impl<T> KMeans<T>
impl<T> KMeans<T>
sourcepub fn kmeans_lloyd<'a, F>(
&self,
k: usize,
max_iter: usize,
init: F,
config: &KMeansConfig<'a, T>
) -> KMeansState<T>
pub fn kmeans_lloyd<'a, F>( &self, k: usize, max_iter: usize, init: F, config: &KMeansConfig<'a, T> ) -> KMeansState<T>
Normal K-Means algorithm implementation. This is the same algorithm as implemented in Matlab (one-phase). (see: https://uk.mathworks.com/help/stats/kmeans.html#bueq7aj-5 Section: More About)
Arguments
- k: Amount of clusters to search for
- max_iter: Limit the maximum amount of iterations (just pass a high number for infinite)
- init: Initialization-Method to use for the initialization of the k centroids
- config:
KMeansConfig
instance, containing several configuration options for the calculation.
Returns
Instance of KMeansState
, containing the final state (result).
Example
use kmeans::*;
fn main() {
let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);
// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());
// Calculate kmeans, using kmean++ as initialization-method
let kmean = KMeans::new(samples, sample_cnt, sample_dims);
let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());
println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
}
sourcepub fn kmeans_minibatch<'a, F>(
&self,
batch_size: usize,
k: usize,
max_iter: usize,
init: F,
config: &KMeansConfig<'a, T>
) -> KMeansState<T>
pub fn kmeans_minibatch<'a, F>( &self, batch_size: usize, k: usize, max_iter: usize, init: F, config: &KMeansConfig<'a, T> ) -> KMeansState<T>
Mini-Batch k-Means implementation. (see: https://dl.acm.org/citation.cfm?id=1772862)
Arguments
- batch_size: Amount of samples to use per iteration (higher -> better approximation but slower)
- k: Amount of clusters to search for
- max_iter: Limit the maximum amount of iterations (just pass a high number for infinite)
- init: Initialization-Method to use for the initialization of the k centroids
- config:
KMeansConfig
instance, containing several configuration options for the calculation.
Returns
Instance of KMeansState
, containing the final state (result).
Example
use kmeans::*;
fn main() {
let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);
// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());
// Calculate kmeans, using kmean++ as initialization-method
let kmean = KMeans::new(samples, sample_cnt, sample_dims);
let result = kmean.kmeans_minibatch(4, k, max_iter, KMeans::init_random_sample, &KMeansConfig::default());
println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
}
sourcepub fn init_kmeanplusplus<'a>(
kmean: &KMeans<T>,
state: &mut KMeansState<T>,
config: &KMeansConfig<'a, T>
)
pub fn init_kmeanplusplus<'a>( kmean: &KMeans<T>, state: &mut KMeansState<T>, config: &KMeansConfig<'a, T> )
K-Means++ initialization method, as implemented in Matlab
Description
This initialization method starts by selecting one sample as first centroid. Proceeding from there, the method iteratively selects one new centroid (per iteration) by calculating each sample’s probability of “being a centroid”. This probability is bigger, the farther away a sample is from its centroid. Then, one sample is randomly selected, while taking their probability of being the next centroid into account. This leads to a tendency of selecting centroids, that are far away from their currently assigned cluster’s centroid. (see: https://uk.mathworks.com/help/stats/kmeans.html#bueq7aj-5 Section: More About)
Note
This method is not meant for direct invocation. Pass a reference to it, to an instance-method of KMeans
.
sourcepub fn init_random_partition<'a>(
kmean: &KMeans<T>,
state: &mut KMeansState<T>,
config: &KMeansConfig<'a, T>
)
pub fn init_random_partition<'a>( kmean: &KMeans<T>, state: &mut KMeansState<T>, config: &KMeansConfig<'a, T> )
Random-Parition initialization method
Description
This initialization method randomly partitions the samples into k partitions, and then calculates these partion’s means. These means are then used as initial clusters.
sourcepub fn init_random_sample<'a>(
kmean: &KMeans<T>,
state: &mut KMeansState<T>,
config: &KMeansConfig<'a, T>
)
pub fn init_random_sample<'a>( kmean: &KMeans<T>, state: &mut KMeansState<T>, config: &KMeansConfig<'a, T> )
Random sample initialization method (a.k.a. Forgy)
Description
This initialization method randomly selects k centroids from the samples as initial centroids.
Note
This method is not meant for direct invocation. Pass a reference to it, to an instance-method of KMeans
.