# Crate linfa_clustering

source ·## Expand description

`linfa-clustering`

aims to provide pure Rust implementations
of popular clustering algorithms.

### The big picture

`linfa-clustering`

is a crate in the `linfa`

ecosystem, a wider effort to
bootstrap a toolkit for classical Machine Learning implemented in pure Rust,
kin in spirit to Python’s `scikit-learn`

.

You can find a roadmap (and a selection of good first issues) here - contributors are more than welcome!

### Current state

Right now `linfa-clustering`

provides the following clustering algorithms:

Implementation choices, algorithmic details and tutorials can be found in the page dedicated to the specific algorithms.

## Structs

DBSCAN (Density-based Spatial Clustering of Applications with Noise)
clusters together neighbouring points, while points in sparse regions are labelled
as noise. Since points may be part of a cluster or noise the transform method returns

`Array1<Option<usize>>`

. It should be noted that some “border” points may technically
belong to more than one cluster but, since the transform function returns only one
label per point (if any), then only one cluster is chosen arbitrarily for those points.Helper struct for building a set of Approximated DBSCAN
hyperparameters

The set of hyperparameters that can be specified for the execution of
the Approximated DBSCAN algorithm.

DBSCAN (Density-based Spatial Clustering of Applications with Noise)
clusters together points which are close together with enough neighbors
labelled points which are sparsely neighbored as noise. As points may be
part of a cluster or noise the predict method returns

`Array1<Option<usize>>`

Helper struct for building a set of DBSCAN hyperparameters

The set of hyperparameters that can be specified for the execution of
the DBSCAN algorithm.

Gaussian Mixture Model (GMM) aims at clustering a dataset by finding normally
distributed sub datasets (hence the Gaussian Mixture name) .

The set of hyperparameters that can be specified for the execution of
the GMM algorithm.

The set of hyperparameters that can be specified for the execution of
the GMM algorithm.

K-means clustering aims to partition a set of unlabeled observations into clusters,
where each observation belongs to the cluster with the nearest mean.

An helper struct used to construct a set of valid hyperparameters for
the K-means algorithm (using the builder pattern).

The set of hyperparameters that can be specified for the execution of
the K-means algorithm.

OPTICS (Ordering Points To Identify Clustering Structure) is a clustering algorithm that
doesn’t explicitly cluster the data but instead creates an “augmented ordering” of the dataset
representing it’s density-based clustering structure. This ordering contains information which
is equivalent to the density-based clusterings and can then be used for automatic and
interactive cluster analysis.

The analysis from running OPTICS on a dataset, this allows you iterate over the data points and
access their core and reachability distances. The ordering of the points also doesn’t match
that of the dataset instead ordering based on the clustering structure worked out during
analysis.

The set of hyperparameters that can be specified for the execution of
the OPTICS algorithm.

This struct represents a data point in the dataset with it’s associated distances obtained from
the OPTICS analysis

## Enums

A specifier for the type of the relation between components’ covariances.

An error when modeling a GMM algorithm

A specifier for the method used for the initialization of the fitting algorithm of GMM

An error when modeling a KMeans algorithm

Specifies centroid initialization algorithm for KMeans.

An error when fitting with an invalid hyperparameter

An error when performing OPTICS Analysis