scirs2-cluster
Comprehensive clustering algorithms for unsupervised learning in Rust, part of the SciRS2 scientific computing ecosystem.
Overview
scirs2-cluster provides production-ready implementations of classical and modern clustering algorithms with SciPy/scikit-learn compatible APIs. v0.3.1 significantly expands beyond the core algorithms with Gaussian Mixture Models, Self-Organizing Maps, topological clustering, streaming/online methods, fuzzy clustering, deep clustering, Bayesian nonparametric methods, and advanced validation tools.
Features
Partitional Clustering (Vector Quantization)
- K-means with multiple initialization strategies
- K-means++ smart initialization (faster convergence)
- Mini-batch K-means for large-scale datasets
- Parallel K-means using Rayon
kmeans2with SciPy-compatible interface- Data whitening / normalization utilities
Hierarchical Clustering
- Agglomerative clustering with full linkage method suite: single, complete, average, Ward, centroid, median, weighted
- Optimized Ward's method: O(n2 log n) vs naive O(n3)
- Dendrogram utilities and flat cluster extraction (
fcluster) - Dendrogram export (Newick, JSON)
Density-Based Clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- OPTICS (Ordering Points To Identify the Clustering Structure)
- HDBSCAN (Hierarchical DBSCAN)
- Density peaks algorithm
- Density ratio estimation clustering
Probabilistic and Mixture Models
- Gaussian Mixture Models (GMM) with full EM algorithm
- Bayesian GMM with variational inference
- Dirichlet Process mixture models (nonparametric Bayesian)
- Probabilistic soft assignments
Prototype-Based and Competitive Learning
- Self-Organizing Maps (SOM) with hexagonal and rectangular topologies
- Competitive learning networks
- Prototype-enhanced clustering with medoid refinement
- Leader algorithm (single-pass with hierarchical tree)
Spectral and Graph-Based
- Spectral clustering with multiple Laplacian variants
- Affinity propagation (exemplar-based)
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
- Mean-shift clustering
Subspace Clustering
- Subspace clustering for high-dimensional data
- Projected clustering and axis-aligned subspace search
- Advanced subspace methods (
subspace_advanced/)
Fuzzy and Soft Clustering
- Fuzzy c-means (FCM) with membership degree outputs
- Soft clustering with probabilistic assignments
- Possibilistic c-means
Topological Clustering
- Topological data analysis applied to clustering
- Persistent homology-based cluster boundary detection
- Mapper algorithm integration
Streaming and Online Clustering
- Online k-means (incremental updates)
- ADWIN-based streaming cluster detection
- CluStream and DenStream for data streams
- Reservoir sampling for large data streams
Time Series Clustering
- DTW-based distance for time series k-means
- Temporal pattern clustering
- Phase-space clustering
Ensemble and Consensus
- Consensus clustering via co-association matrices
- Evidence Accumulation Clustering (EAC)
- Bagging-based and weighted voting ensembles
- Stability-based cluster selection
Deep Clustering
- Deep embedding via autoencoder
- DEC (Deep Embedded Clustering)
- Deep adversarial clustering
- Transformer-based cluster embeddings
Biclustering and Co-clustering
- Biclustering for simultaneous row/column clustering
- Co-clustering (information-theoretic)
Evaluation Metrics
- Silhouette coefficient (individual and average)
- Davies-Bouldin index
- Calinski-Harabasz index
- Gap statistic for optimal k selection
- Adjusted Rand Index (ARI)
- Normalized Mutual Information (NMI)
- Homogeneity, Completeness, V-measure
- Stability analysis across bootstrap samples
Quick Start
Add to your Cargo.toml:
[]
= "0.3.1"
With parallel processing:
[]
= { = "0.3.1", = ["parallel"] }
K-means Clustering
use kmeans;
use Array2;
Hierarchical Clustering
use ;
use Array2;
DBSCAN
use dbscan;
use Array2;
Gaussian Mixture Model
use GaussianMixtureModel;
use Array2;
Cluster Validation
use ;
use ;
Feature Flags
| Flag | Description |
|---|---|
parallel |
Enable Rayon-based multi-threaded distance computation and fitting |
simd |
SIMD-accelerated distance computations |
Related Crates
scirs2-stats- Statistical distributions and testsscirs2-transform- Dimensionality reduction and preprocessingscirs2-spatial- Spatial indexing (KD-tree, Ball-tree)- SciRS2 project
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.