Expand description
linfa-clustering
aims to provide pure Rust implementations
of popular clustering algorithms.
§The big picture
linfa-clustering
is a crate in the linfa
ecosystem, a wider effort to
bootstrap a toolkit for classical Machine Learning implemented in pure Rust,
kin in spirit to Python’s scikit-learn
.
You can find a roadmap (and a selection of good first issues) here - contributors are more than welcome!
§Current state
Right now linfa-clustering
provides the following clustering algorithms:
- K-Means
- DBSCAN
- Approximated DBSCAN (Currently an alias for DBSCAN, due to its superior performance)
- Gaussian-Mixture-Model
- OPTICS
Implementation choices, algorithmic details and tutorials can be found in the page dedicated to the specific algorithms.
Structs§
- Dbscan
- DBSCAN (Density-based Spatial Clustering of Applications with Noise)
clusters together points which are close together with enough neighbors
labelled points which are sparsely neighbored as noise. As points may be
part of a cluster or noise the predict method returns
Array1<Option<usize>>
- Dbscan
Params - Helper struct for building a set of DBSCAN hyperparameters
- Dbscan
Valid Params - The set of hyperparameters that can be specified for the execution of the DBSCAN algorithm.
- Gaussian
Mixture Model - Gaussian Mixture Model (GMM) aims at clustering a dataset by finding normally distributed sub datasets (hence the Gaussian Mixture name) .
- GmmParams
- The set of hyperparameters that can be specified for the execution of the GMM algorithm.
- GmmValid
Params - The set of hyperparameters that can be specified for the execution of the GMM algorithm.
- KMeans
- K-means clustering aims to partition a set of unlabeled observations into clusters, where each observation belongs to the cluster with the nearest mean.
- KMeans
Params - An helper struct used to construct a set of valid hyperparameters for the K-means algorithm (using the builder pattern).
- KMeans
Valid Params - The set of hyperparameters that can be specified for the execution of the K-means algorithm.
- Optics
- OPTICS (Ordering Points To Identify Clustering Structure) is a clustering algorithm that doesn’t explicitly cluster the data but instead creates an “augmented ordering” of the dataset representing it’s density-based clustering structure. This ordering contains information which is equivalent to the density-based clusterings and can then be used for automatic and interactive cluster analysis.
- Optics
Analysis - The analysis from running OPTICS on a dataset, this allows you iterate over the data points and access their core and reachability distances. The ordering of the points also doesn’t match that of the dataset instead ordering based on the clustering structure worked out during analysis.
- Optics
Params - Optics
Valid Params - The set of hyperparameters that can be specified for the execution of the OPTICS algorithm.
- Sample
- This struct represents a data point in the dataset with it’s associated distances obtained from the OPTICS analysis
Enums§
- Dbscan
Params Error - GmmCovar
Type - A specifier for the type of the relation between components’ covariances.
- GmmError
- An error when modeling a GMM algorithm
- GmmInit
Method - A specifier for the method used for the initialization of the fitting algorithm of GMM
- IncrK
Means Error - KMeans
Error - An error when modeling a KMeans algorithm
- KMeans
Init - Specifies centroid initialization algorithm for KMeans.
- KMeans
Params Error - An error when fitting with an invalid hyperparameter
- Optics
Error - An error when performing OPTICS Analysis