1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
//! Clustering algorithms for grouping similar items.
//!
//! This module provides clustering algorithms for dense vectors.
//!
//! ## Hard vs Soft Clustering
//!
//! **Hard clustering** assigns each item to exactly one cluster. Simple, but
//! loses information when items genuinely span multiple groups.
//!
//! **Soft clustering** gives each item a probability distribution over clusters.
//! A text chunk might be 60% about "machine learning", 30% about "statistics",
//! 10% about "software". This reflects reality better than forcing a choice.
//!
//! Soft clustering (e.g., GMM) is not implemented in this crate yet.
//!
//! ## Algorithms
//!
//! **Batch**
//! - [`Kmeans`]: Lloyd's algorithm with k-means++ seeding, generic over distance metrics.
//! - [`Dbscan`]: density-based clustering with noise detection (Ester et al. 1996).
//! - [`Hdbscan`]: hierarchical density clustering without a global epsilon (Campello et al. 2013).
//! - [`EVoC`]: multi-granularity hierarchy via MST on random projections.
//! - [`CopKmeans`]: constrained k-means with must-link / cannot-link (Wagstaff et al. 2001).
//! - [`CorrelationClustering`]: PIVOT + local search on signed graphs (Bansal et al. 2004).
//!
//! **Streaming**
//! - [`MiniBatchKmeans`]: online k-means with decaying learning rate (Sculley 2010).
//! - [`DenStream`]: streaming density-based clustering with decay (Cao et al. 2006).
//!
//! ## Usage
//!
//! ```rust
//! use clump::cluster::{Dbscan, EVoC, EVoCParams, Kmeans};
//!
//! let data = vec![
//! vec![0.0, 0.0],
//! vec![0.1, 0.1],
//! vec![10.0, 10.0],
//! vec![10.1, 10.1],
//! ];
//!
//! // Hard clustering with K-means
//! let labels = Kmeans::new(2).fit_predict(&data).unwrap();
//! assert_eq!(labels[0], labels[1]); // First two together
//! assert_ne!(labels[0], labels[2]); // Separate from last two
//!
//! // Density-based clustering with DBSCAN
//! let labels = Dbscan::new(0.5, 2).fit_predict(&data).unwrap();
//! assert_eq!(labels.len(), data.len());
//!
//! // Hierarchical clustering with EVōC (noise as `None`)
//! let mut evoc = EVoC::new(EVoCParams {
//! intermediate_dim: 1,
//! min_cluster_size: 2,
//! seed: Some(42),
//! ..Default::default()
//! });
//! let labels = evoc.fit_predict(&data).unwrap();
//! assert_eq!(labels.len(), data.len());
//! ```
/// Adapters for ndarray and flat-slice input conversion.
pub
/// Cluster evaluation metrics (silhouette, Calinski-Harabasz, Davies-Bouldin).
pub use ;
pub use ;
pub use ;
pub use DenStream;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use MiniBatchKmeans;