Expand description
§kmeans_smid
kmeans_smid is a small and fast library for k-means clustering calculations. It fixes smid problem from kmeans crate. Here is a small example, using kmean++ as initialization method and lloyd as k-means variant:
use kmeans_smid::*;
fn main() {
let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);
// Generate some random data
let mut samples = vec![0.0f64;sample_cnt * sample_dims];
samples.iter_mut().for_each(|v| *v = rand::random());
// Calculate kmeans, using kmean++ as initialization-method
let kmean = KMeans<f64, 8>::new(samples, sample_cnt, sample_dims);
let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());
println!("Centroids: {:?}", result.centroids);
println!("Cluster-Assignments: {:?}", result.assignments);
println!("Error: {}", result.distsum);
}
§Datastructures
For performance-reasons, all calculations are done on bare vectors, using hand-written SIMD intrinsics from the packed_simd
crate. All vectors are stored row-major, so each sample is stored in a consecutive block of memory.
§Supported variants / algorithms
- lloyd (standard kmeans)
- minibatch
§Supported centroid initialization methods
- KMean++
- random partition
- random sample
Structs§
- KMeans
- Entrypoint of this crate’s API-Surface.
- KMeans
Config - This is a structure holding various configuration options for the a k-means calculations, such as the random number generator to use, or a couple of callbacks, that can be set to get status information from a running k-means calculation.
- KMeans
Config Builder - KMeans
State - This is the internally used data-structure, storing the current state during calculation, as
well as the final result, as returned by the API.
All mutations are done in this structure, making
KMeans
immutable, and therefore allowing it to be used in parallel, without having to duplicate the input-data.
Enums§
- Abort
Strategy - Enum with possible abort strategies. These strategies specify when a running iteration (with the k-means calculation) is aborted.