ferrolearn-cluster 0.4.0

Clustering algorithms for the ferrolearn ML framework
Documentation

ferrolearn-cluster

Clustering algorithms for the ferrolearn machine learning framework. Validated against scikit-learn 1.8.0 with exact ARI parity on every measured estimator — see the workspace BENCHMARKS.md.

Algorithms

Model Description
KMeans K-Means with Greedy KMeans++ initialisation (matches sklearn's _kmeans_plusplus)
MiniBatchKMeans Mini-batch K-Means; sklearn 1.4+ defaults (batch_size=1024, max_iter=100, tol=0)
BisectingKMeans Hierarchical bisecting K-Means
DBSCAN Density-based clustering — discovers clusters of arbitrary shape
OPTICS Ordering Points To Identify the Clustering Structure
HDBSCAN Hierarchical density-based clustering
AgglomerativeClustering Ward / complete / average / single linkage
Birch Memory-efficient hierarchical clustering with CF-Tree
MeanShift Non-parametric mode-seeking clustering
SpectralClustering Graph-Laplacian eigenmap clustering
AffinityPropagation Message-passing clustering
FeatureAgglomeration Hierarchical clustering of features (transformer)
GaussianMixture Gaussian Mixture Model via EM (full / tied / diag / spherical covariance) with Greedy KMeans++ init + reg_covar=1e-6 M-step regularisation
BayesianGaussianMixture Variational-Bayes GMM
LabelPropagation / LabelSpreading Semi-supervised graph propagation

Example

use ferrolearn_cluster::{KMeans, FittedKMeans};
use ferrolearn_core::{Fit, Predict};
use ndarray::array;

let x = array![
    [1.0_f64, 2.0], [1.5, 1.8], [1.2, 2.2],
    [5.0, 6.0], [5.5, 5.8], [5.2, 6.2],
];

let model = KMeans::<f64>::new(2).with_max_iter(100);
let fitted = model.fit(&x, &()).unwrap();

let labels = fitted.predict(&x).unwrap();
let distances = fitted.transform(&x).unwrap();

sklearn parity highlights (0.3.0)

  • KMeans, MiniBatchKMeans, GaussianMixture all upgraded to Greedy KMeans++ initialisation (Arthur & Vassilvitskii 2007 with 2 + log(k) trial selection — matches sklearn's _kmeans_plusplus).
  • MiniBatchKMeans defaults switched to sklearn 1.4+ values (batch_size 100 → 1024, max_iter 300 → 100, tol 1e-4 → 0.0).
  • GaussianMixture M-step now adds reg_covar = 1e-6 to component covariance diagonals, matching scikit-learn.
  • Result: mean Δ ARI = 0.0000 across all 15 paired bench runs.

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.