KMeans

Struct KMeans 

Source
pub struct KMeans<T, const LANES: usize, D: DistanceFunction<T, LANES>>
where T: Primitive, LaneCount<LANES>: SupportedLaneCount, Simd<T, LANES>: SupportedSimdArray<T, LANES>,
{ /* private fields */ }
Expand description

Entrypoint of this crate’s API-Surface.

Create an instance of this struct, giving the samples you want to operate on. The primitive type of the passed samples array will be the type used internaly for all calculations, as well as the result as stored in the returned KMeansState structure.

§Supported variants

§Supported initialization methods

§Generics

  • T: The type of primitive to work with (e.g. f32 of f64)
  • LANES: The amount of SIMD lanes (values in one SIMD vector) to limit the generated code to Note that the generated code selects the appropriate instructions for every platform
  • D: The distance function to use. Default is Euclidean distance.

Implementations§

Source§

impl<T, const LANES: usize, D: DistanceFunction<T, LANES>> KMeans<T, LANES, D>
where T: Primitive, LaneCount<LANES>: SupportedLaneCount, Simd<T, LANES>: SupportedSimdArray<T, LANES>,

Source

pub fn new( samples: &[T], sample_cnt: usize, sample_dims: usize, distance_fn: D, ) -> Self

Create a new instance of the KMeans structure.

§Arguments
  • samples: Slice of samples [row-major] = [,,,…]
  • sample_cnt: Amount of samples, contained in the passed samples vector
  • sample_dims: Amount of dimensions each sample from the sample vector has
  • distance_fn: Distance function to use for the calculation
Source

pub fn kmeans_lloyd<F>( &self, k: usize, max_iter: usize, init: F, config: &KMeansConfig<'_, T>, ) -> KMeansState<T>
where for<'c> F: FnOnce(&KMeans<T, LANES, D>, &mut KMeansState<T>, &KMeansConfig<'c, T>),

Normal K-Means algorithm implementation. This is the same algorithm as implemented in Matlab (one-phase). (see: https://uk.mathworks.com/help/stats/kmeans.html#bueq7aj-5 Section: More About)

§Arguments
  • k: Amount of clusters to search for
  • max_iter: Limit the maximum amount of iterations (just pass a high number for infinite)
  • init: Initialization-Method to use for the initialization of the k centroids
  • config: KMeansConfig instance, containing several configuration options for the calculation.
§Returns

Instance of KMeansState, containing the final state (result).

§Example
use kmeans::*;

let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);

// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());

// Calculate kmeans, using kmean++ as initialization-method
// KMeans<_, 8> specifies to use f64 SIMD vectors with 8 lanes (e.g. AVX512)
let kmean: KMeans<_, 8, _> = KMeans::new(&samples, sample_cnt, sample_dims, EuclideanDistance);
let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());

println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
Source

pub fn kmeans_minibatch<F>( &self, batch_size: usize, k: usize, max_iter: usize, init: F, config: &KMeansConfig<'_, T>, ) -> KMeansState<T>
where for<'c> F: FnOnce(&KMeans<T, LANES, D>, &mut KMeansState<T>, &KMeansConfig<'c, T>), T: Primitive, LaneCount<LANES>: SupportedLaneCount, Simd<T, LANES>: SupportedSimdArray<T, LANES>,

Mini-Batch k-Means implementation. (see: https://dl.acm.org/citation.cfm?id=1772862)

§Arguments
  • batch_size: Amount of samples to use per iteration (higher -> better approximation but slower)
  • k: Amount of clusters to search for
  • max_iter: Limit the maximum amount of iterations (just pass a high number for infinite)
  • init: Initialization-Method to use for the initialization of the k centroids
  • config: KMeansConfig instance, containing several configuration options for the calculation.
§Returns

Instance of KMeansState, containing the final state (result).

§Example
use kmeans::*;

let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);

// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());

// Calculate kmeans, using kmean++ as initialization-method
// KMeans<_, 8> specifies to use f64 SIMD vectors with 8 lanes (e.g. AVX512)
let kmean: KMeans<_, 8, _> = KMeans::new(&samples, sample_cnt, sample_dims, EuclideanDistance);
let result = kmean.kmeans_minibatch(4, k, max_iter, KMeans::init_random_sample, &KMeansConfig::default());

println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
Source

pub fn init_kmeanplusplus( kmean: &KMeans<T, LANES, D>, state: &mut KMeansState<T>, config: &KMeansConfig<'_, T>, )

K-Means++ initialization method, as implemented in Matlab

§Description

This initialization method starts by selecting one sample as first centroid. Proceeding from there, the method iteratively selects one new centroid (per iteration) by calculating each sample’s probability of “being a centroid”. This probability is bigger, the farther away a sample is from its centroid. Then, one sample is randomly selected, while taking their probability of being the next centroid into account. This leads to a tendency of selecting centroids, that are far away from their currently assigned cluster’s centroid. (see: https://uk.mathworks.com/help/stats/kmeans.html#bueq7aj-5 Section: More About)

§Note

This method is not meant for direct invocation. Pass a reference to it, to an instance-method of KMeans.

Source

pub fn init_random_partition( kmean: &KMeans<T, LANES, D>, state: &mut KMeansState<T>, config: &KMeansConfig<'_, T>, )

Random-Parition initialization method

§Description

This initialization method randomly partitions the samples into k partitions, and then calculates these partion’s means. These means are then used as initial clusters.

Source

pub fn init_random_sample( kmean: &KMeans<T, LANES, D>, state: &mut KMeansState<T>, config: &KMeansConfig<'_, T>, )

Random sample initialization method (a.k.a. Forgy)

§Description

This initialization method randomly selects k centroids from the samples as initial centroids.

§Note

This method is not meant for direct invocation. Pass a reference to it, to an instance-method of KMeans.

Source

pub fn init_precomputed( centroids: Vec<T>, ) -> impl Fn(&KMeans<T, LANES, D>, &mut KMeansState<T>, &KMeansConfig<'_, T>)

Precomputed centroids initialization method

§Description

This initialization method requires a precomputed list of k centroids to use as initial centroids.

§Note

This method must be invoked with a precomputed list of centroids. It then returns a closure that can be passed to the KMeans object.

Auto Trait Implementations§

§

impl<T, const LANES: usize, D> Freeze for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized, D: Freeze,

§

impl<T, const LANES: usize, D> RefUnwindSafe for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized, D: RefUnwindSafe, T: RefUnwindSafe,

§

impl<T, const LANES: usize, D> Send for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized,

§

impl<T, const LANES: usize, D> Sync for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized,

§

impl<T, const LANES: usize, D> Unpin for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized, D: Unpin, T: Unpin,

§

impl<T, const LANES: usize, D> UnwindSafe for KMeans<T, LANES, D>
where Simd<T, LANES>: Sized, D: UnwindSafe, T: RefUnwindSafe + UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V