scirs2-cluster 0.3.0

Clustering algorithms module for SciRS2 (scirs2-cluster)
Documentation
# scirs2-cluster

[![crates.io](https://img.shields.io/crates/v/scirs2-cluster.svg)](https://crates.io/crates/scirs2-cluster)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](../LICENSE)
[![Documentation](https://img.shields.io/docsrs/scirs2-cluster)](https://docs.rs/scirs2-cluster)

Comprehensive clustering algorithms for unsupervised learning in Rust, part of the [SciRS2](https://github.com/cool-japan/scirs) scientific computing ecosystem.

## Overview

`scirs2-cluster` provides production-ready implementations of classical and modern clustering algorithms with SciPy/scikit-learn compatible APIs. v0.3.0 significantly expands beyond the core algorithms with Gaussian Mixture Models, Self-Organizing Maps, topological clustering, streaming/online methods, fuzzy clustering, deep clustering, Bayesian nonparametric methods, and advanced validation tools.

## Features

### Partitional Clustering (Vector Quantization)
- K-means with multiple initialization strategies
- K-means++ smart initialization (faster convergence)
- Mini-batch K-means for large-scale datasets
- Parallel K-means using Rayon
- `kmeans2` with SciPy-compatible interface
- Data whitening / normalization utilities

### Hierarchical Clustering
- Agglomerative clustering with full linkage method suite: single, complete, average, Ward, centroid, median, weighted
- Optimized Ward's method: O(n^2 log n) vs naive O(n^3)
- Dendrogram utilities and flat cluster extraction (`fcluster`)
- Dendrogram export (Newick, JSON)

### Density-Based Clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- OPTICS (Ordering Points To Identify the Clustering Structure)
- HDBSCAN (Hierarchical DBSCAN)
- Density peaks algorithm
- Density ratio estimation clustering

### Probabilistic and Mixture Models
- Gaussian Mixture Models (GMM) with full EM algorithm
- Bayesian GMM with variational inference
- Dirichlet Process mixture models (nonparametric Bayesian)
- Probabilistic soft assignments

### Prototype-Based and Competitive Learning
- Self-Organizing Maps (SOM) with hexagonal and rectangular topologies
- Competitive learning networks
- Prototype-enhanced clustering with medoid refinement
- Leader algorithm (single-pass with hierarchical tree)

### Spectral and Graph-Based
- Spectral clustering with multiple Laplacian variants
- Affinity propagation (exemplar-based)
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
- Mean-shift clustering

### Subspace Clustering
- Subspace clustering for high-dimensional data
- Projected clustering and axis-aligned subspace search
- Advanced subspace methods (`subspace_advanced/`)

### Fuzzy and Soft Clustering
- Fuzzy c-means (FCM) with membership degree outputs
- Soft clustering with probabilistic assignments
- Possibilistic c-means

### Topological Clustering
- Topological data analysis applied to clustering
- Persistent homology-based cluster boundary detection
- Mapper algorithm integration

### Streaming and Online Clustering
- Online k-means (incremental updates)
- ADWIN-based streaming cluster detection
- CluStream and DenStream for data streams
- Reservoir sampling for large data streams

### Time Series Clustering
- DTW-based distance for time series k-means
- Temporal pattern clustering
- Phase-space clustering

### Ensemble and Consensus
- Consensus clustering via co-association matrices
- Evidence Accumulation Clustering (EAC)
- Bagging-based and weighted voting ensembles
- Stability-based cluster selection

### Deep Clustering
- Deep embedding via autoencoder
- DEC (Deep Embedded Clustering)
- Deep adversarial clustering
- Transformer-based cluster embeddings

### Biclustering and Co-clustering
- Biclustering for simultaneous row/column clustering
- Co-clustering (information-theoretic)

### Evaluation Metrics
- Silhouette coefficient (individual and average)
- Davies-Bouldin index
- Calinski-Harabasz index
- Gap statistic for optimal k selection
- Adjusted Rand Index (ARI)
- Normalized Mutual Information (NMI)
- Homogeneity, Completeness, V-measure
- Stability analysis across bootstrap samples

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
scirs2-cluster = "0.3.0"
```

With parallel processing:

```toml
[dependencies]
scirs2-cluster = { version = "0.3.0", features = ["parallel"] }
```

### K-means Clustering

```rust
use scirs2_cluster::vq::kmeans;
use scirs2_core::ndarray::Array2;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Array2::from_shape_vec((6, 2), vec![
        1.0, 2.0,  1.2, 1.8,  0.8, 1.9,
        3.7, 4.2,  3.9, 3.9,  4.2, 4.1,
    ])?;

    let (centroids, labels) = kmeans(data.view(), 2, None, None, None, None)?;

    println!("Centroids: {:?}", centroids);
    println!("Labels: {:?}", labels);
    Ok(())
}
```

### Hierarchical Clustering

```rust
use scirs2_cluster::hierarchy::{linkage, fcluster, LinkageMethod};
use scirs2_core::ndarray::Array2;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Array2::from_shape_vec((6, 2), vec![
        1.0, 2.0,  1.2, 1.8,  0.8, 1.9,
        3.7, 4.2,  3.9, 3.9,  4.2, 4.1,
    ])?;

    let z = linkage(data.view(), LinkageMethod::Ward, None)?;
    let labels = fcluster(&z, 2, None)?;

    println!("Cluster assignments: {:?}", labels);
    Ok(())
}
```

### DBSCAN

```rust
use scirs2_cluster::density::dbscan;
use scirs2_core::ndarray::Array2;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Array2::from_shape_vec((8, 2), vec![
        1.0, 2.0,  1.5, 1.8,  1.3, 1.9,
        5.0, 7.0,  5.1, 6.8,  5.2, 7.1,
        0.0, 10.0, 10.0, 0.0,
    ])?;

    // eps=0.8, min_samples=2
    let labels = dbscan(data.view(), 0.8, 2, None)?;
    println!("Labels (-1 = noise): {:?}", labels);
    Ok(())
}
```

### Gaussian Mixture Model

```rust
use scirs2_cluster::probabilistic::GaussianMixtureModel;
use scirs2_core::ndarray::Array2;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Array2::<f64>::zeros((100, 2)); // replace with real data

    let mut gmm = GaussianMixtureModel::new(3, 100, 1e-6, 42)?;
    gmm.fit(data.view())?;

    let labels = gmm.predict(data.view())?;
    let responsibilities = gmm.predict_proba(data.view())?;
    println!("Soft assignments shape: {:?}", responsibilities.shape());
    Ok(())
}
```

### Cluster Validation

```rust
use scirs2_cluster::metrics::{
    silhouette_score, davies_bouldin_score, calinski_harabasz_score,
};
use scirs2_core::ndarray::{Array2, Array1};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Array2::<f64>::zeros((100, 5));
    let labels = Array1::<usize>::zeros(100);

    let sil = silhouette_score(data.view(), labels.view())?;
    let db  = davies_bouldin_score(data.view(), labels.view())?;
    let ch  = calinski_harabasz_score(data.view(), labels.view())?;

    println!("Silhouette: {:.4}", sil);
    println!("Davies-Bouldin: {:.4}", db);
    println!("Calinski-Harabasz: {:.4}", ch);
    Ok(())
}
```

## Feature Flags

| Flag | Description |
|------|-------------|
| `parallel` | Enable Rayon-based multi-threaded distance computation and fitting |
| `simd` | SIMD-accelerated distance computations |

## Related Crates

- [`scirs2-stats`]../scirs2-stats - Statistical distributions and tests
- [`scirs2-transform`]../scirs2-transform - Dimensionality reduction and preprocessing
- [`scirs2-spatial`]../scirs2-spatial - Spatial indexing (KD-tree, Ball-tree)
- [SciRS2 project]https://github.com/cool-japan/scirs

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](../LICENSE) for details.