ferrolearn-cluster
Clustering algorithms for the ferrolearn machine learning framework. Validated against scikit-learn 1.8.0 with exact ARI parity on every measured estimator — see the workspace BENCHMARKS.md.
Algorithms
| Model | Description |
|---|---|
KMeans |
K-Means with Greedy KMeans++ initialisation (matches sklearn's _kmeans_plusplus) |
MiniBatchKMeans |
Mini-batch K-Means; sklearn 1.4+ defaults (batch_size=1024, max_iter=100, tol=0) |
BisectingKMeans |
Hierarchical bisecting K-Means |
DBSCAN |
Density-based clustering — discovers clusters of arbitrary shape |
OPTICS |
Ordering Points To Identify the Clustering Structure |
HDBSCAN |
Hierarchical density-based clustering |
AgglomerativeClustering |
Ward / complete / average / single linkage |
Birch |
Memory-efficient hierarchical clustering with CF-Tree |
MeanShift |
Non-parametric mode-seeking clustering |
SpectralClustering |
Graph-Laplacian eigenmap clustering |
AffinityPropagation |
Message-passing clustering |
FeatureAgglomeration |
Hierarchical clustering of features (transformer) |
GaussianMixture |
Gaussian Mixture Model via EM (full / tied / diag / spherical covariance) with Greedy KMeans++ init + reg_covar=1e-6 M-step regularisation |
BayesianGaussianMixture |
Variational-Bayes GMM |
LabelPropagation / LabelSpreading |
Semi-supervised graph propagation |
Example
use ;
use ;
use array;
let x = array!;
let model = new.with_max_iter;
let fitted = model.fit.unwrap;
let labels = fitted.predict.unwrap;
let distances = fitted.transform.unwrap;
sklearn parity highlights (0.3.0)
KMeans,MiniBatchKMeans,GaussianMixtureall upgraded to Greedy KMeans++ initialisation (Arthur & Vassilvitskii 2007 with2 + log(k)trial selection — matches sklearn's_kmeans_plusplus).MiniBatchKMeansdefaults switched to sklearn 1.4+ values (batch_size 100 → 1024,max_iter 300 → 100,tol 1e-4 → 0.0).GaussianMixtureM-step now addsreg_covar = 1e-6to component covariance diagonals, matching scikit-learn.- Result: mean Δ ARI = 0.0000 across all 15 paired bench runs.
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.