Expand description
Clustering algorithms for sklears
This crate provides implementations of clustering algorithms including:
- K-Means clustering with various initialization methods
- X-Means for automatic cluster number selection
- G-Means for Gaussian cluster detection with automatic number selection
- Mini-batch K-Means for large datasets
- Fuzzy C-Means clustering with membership degrees
- DBSCAN (Density-Based Spatial Clustering)
- Incremental DBSCAN for streaming data and large datasets
- HDBSCAN (Hierarchical Density-Based Spatial Clustering)
- OPTICS (Ordering Points To Identify Clustering Structure)
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
- Hierarchical clustering
- Mean Shift with adaptive bandwidth estimation
- Density Peaks clustering for automatic cluster center detection
- KDE Clustering using kernel density estimation for density-based clustering
- Spectral Clustering
- Gaussian Mixture Models with model selection criteria (AIC, BIC, ICL)
- Dirichlet Process Mixture Models for infinite mixture modeling
- Local Outlier Factor (LOF) for density-based outlier detection
- CURE (Clustering Using REpresentatives) for large datasets with irregular shapes
- ROCK (RObust Clustering using linKs) for categorical data clustering
- Streaming clustering algorithms (Online K-Means, CluStream, Sliding Window K-Means)
- Graph clustering algorithms (Modularity-based, Louvain, Label Propagation, Spectral)
- Evolutionary and bio-inspired clustering algorithms (PSO, GA, ACO, ABC, Differential Evolution)
- Comprehensive validation metrics for clustering evaluation including stability analysis
These implementations leverage scirs2’s cluster module for efficient computation.
Re-exports§
pub use birch::BIRCHConfig;pub use birch::ClusteringFeature;pub use birch::BIRCH;pub use cure::CUREConfig;pub use cure::CUREDistanceMetric;pub use cure::CUREFitted;pub use cure::CURE;pub use dbscan::DBSCANConfig;pub use dbscan::DBSCAN;pub use dbscan::NOISE;pub use density_peaks::DensityPeaks;pub use density_peaks::DensityPeaksConfig;pub use density_peaks::DistanceMetric as DensityPeaksDistanceMetric;pub use dirichlet_process::DirichletProcessConfig;pub use dirichlet_process::DirichletProcessMixture;pub use dirichlet_process::PredictProbaDP;pub use ensemble::BaggingClustering;pub use ensemble::EnsembleConfig;pub use ensemble::EnsembleConfigBuilder;pub use ensemble::EnsembleMethod;pub use ensemble::EnsembleResult;pub use ensemble::EvidenceAccumulationClustering;pub use ensemble::VotingEnsemble;pub use evolutionary::PSOClustering;pub use evolutionary::PSOClusteringBuilder;pub use evolutionary::PSOClusteringFitted;pub use feature_selection::FeatureSelectionConfig;pub use feature_selection::FeatureSelectionConfigBuilder;pub use feature_selection::FeatureSelectionMethod;pub use feature_selection::FeatureSelectionResult;pub use feature_selection::FeatureSelector;pub use fuzzy_cmeans::FuzzyCMeans;pub use fuzzy_cmeans::FuzzyCMeansConfig;pub use fuzzy_cmeans::PredictMembership;pub use gmm::BayesianGaussianMixture;pub use gmm::CovarianceType;pub use gmm::GaussianMixture;pub use gmm::GaussianMixtureConfig;pub use gmm::ModelSelectionCriterion;pub use gmm::ModelSelectionResult;pub use gmm::PredictProba;pub use gmm::WeightInit;pub use graph_clustering::Graph;pub use graph_clustering::GraphClusteringResult;pub use graph_clustering::LabelPropagationClustering;pub use graph_clustering::LabelPropagationConfig as GraphLabelPropagationConfig;pub use graph_clustering::LouvainClustering;pub use graph_clustering::LouvainConfig;pub use graph_clustering::LouvainResult;pub use graph_clustering::ModularityClustering;pub use graph_clustering::ModularityClusteringConfig;pub use graph_clustering::SpectralGraphClustering;pub use graph_clustering::SpectralGraphConfig;pub use hdbscan::ClusterStat;pub use hdbscan::HDBSCANConfig;pub use hdbscan::HDBSCAN;pub use hierarchical::AgglomerativeClustering;pub use hierarchical::AgglomerativeClusteringConfig;pub use hierarchical::Constraint;pub use hierarchical::ConstraintSet;pub use hierarchical::Dendrogram;pub use hierarchical::DendrogramExport;pub use hierarchical::DendrogramLinkExport;pub use hierarchical::DendrogramNode;pub use hierarchical::DendrogramNodeExport;pub use hierarchical::MemoryStrategy;pub use incremental_dbscan::DistanceMetric as IncrementalDistanceMetric;pub use incremental_dbscan::IncrementalDBSCAN;pub use incremental_dbscan::IncrementalDBSCANConfig;pub use kde_clustering::BandwidthMethod;pub use kde_clustering::KDEClustering;pub use kde_clustering::KDEClusteringConfig;pub use kde_clustering::KernelType;pub use kmeans::GMeans;pub use kmeans::GMeansConfig;pub use kmeans::InformationCriterion;pub use kmeans::KMeans;pub use kmeans::KMeansConfig;pub use kmeans::KMeansInit;pub use kmeans::MiniBatchKMeans;pub use kmeans::MiniBatchKMeansConfig;pub use kmeans::XMeans;pub use kmeans::XMeansConfig;pub use locality_sensitive_hashing::LSHConfig;pub use locality_sensitive_hashing::LSHFamily;pub use locality_sensitive_hashing::LSHIndex;pub use locality_sensitive_hashing::LSHIndexStats;pub use locality_sensitive_hashing::MemoryUsage;pub use locality_sensitive_hashing::TableStats;pub use lof::DistanceMetric as LOFDistanceMetric;pub use lof::LOFConfig;pub use lof::LOF;pub use mean_shift::MeanShift;pub use mean_shift::MeanShiftConfig;pub use memory_mapped::MemoryMappedConfig;pub use memory_mapped::MemoryMappedDistanceMatrix;pub use memory_mapped::MemoryStats;pub use multi_view::ConsensusClustering;pub use multi_view::ConsensusClusteringConfig;pub use multi_view::ConsensusClusteringFitted;pub use multi_view::ConsensusMethod;pub use multi_view::MultiViewData;pub use multi_view::MultiViewKMeans;pub use multi_view::MultiViewKMeansConfig;pub use multi_view::MultiViewKMeansFitted;pub use multi_view::ViewWeighting;pub use multi_view::WeightLearning;pub use optics::Algorithm;pub use optics::ClusterMethod;pub use optics::DistanceMetric as OpticsDistanceMetric;pub use optics::Optics;pub use optics::OpticsConfig;pub use optics::OpticsOrdering;pub use out_of_core::ClusterSummary;pub use out_of_core::OutOfCoreConfig;pub use out_of_core::OutOfCoreDataLoader;pub use out_of_core::OutOfCoreKMeans;pub use rock::ROCKConfig;pub use rock::ROCKFitted;pub use rock::ROCKSimilarity;pub use rock::ROCK;pub use semi_supervised::ConstrainedKMeans;pub use semi_supervised::ConstrainedKMeansConfig;pub use semi_supervised::ConstrainedKMeansFitted;pub use semi_supervised::ConstraintHandling;pub use semi_supervised::ConstraintType;pub use semi_supervised::LabelPropagation;pub use semi_supervised::LabelPropagationConfig;pub use semi_supervised::LabelPropagationFitted;pub use simd_distances::simd_distance;pub use simd_distances::simd_distance_batch;pub use simd_distances::simd_k_nearest_neighbors;pub use simd_distances::DistanceMetric;pub use simd_distances::OptimizedDistanceComputer;pub use simd_distances::SimdDistanceMetric;pub use sparse_matrix::GraphStats;pub use sparse_matrix::SparseDistanceMatrix;pub use sparse_matrix::SparseEntry;pub use sparse_matrix::SparseMatrixConfig;pub use sparse_matrix::SparseMatrixStats;pub use sparse_matrix::SparseNeighborhoodGraph;pub use spectral::Affinity;pub use spectral::EigenSolver;pub use spectral::NormalizationMethod;pub use spectral::SpectralClustering;pub use spectral::SpectralClusteringConfig;pub use streaming::CluStream;pub use streaming::MicroCluster;pub use streaming::OnlineKMeans;pub use streaming::SlidingWindowKMeans;pub use streaming::StreamingConfig;pub use text_clustering::DocumentClustering;pub use text_clustering::DocumentClusteringConfig;pub use text_clustering::DocumentClusteringResult;pub use text_clustering::SphericalInit;pub use text_clustering::SphericalKMeans;pub use text_clustering::SphericalKMeansConfig;pub use text_clustering::SphericalKMeansFitted;pub use time_series::CentroidAveraging;pub use time_series::ChangeDetectionTest;pub use time_series::DTWKMeans;pub use time_series::DTWKMeansConfig;pub use time_series::DTWKMeansFitted;pub use time_series::RegimeChangeConfig;pub use time_series::RegimeChangeDetector;pub use time_series::RegimeChangeResult;pub use time_series::ShapeClustering;pub use time_series::ShapeClusteringConfig;pub use time_series::ShapeClusteringFitted;pub use time_series::ShapeDistanceMetric;pub use time_series::TemporalSegmentationClustering;pub use time_series::TemporalSegmentationConfig;pub use time_series::TemporalSegmentationResult;pub use validation::ClusteringValidator;pub use validation::GapStatisticResult;pub use validation::SilhouetteResult;pub use validation::ValidationMetric;
Modules§
- birch
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) implementation
- cure
- CURE (Clustering Using REpresentatives) Algorithm
- dbscan
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise) implementation using scirs2
- density_
peaks - Density Peaks Clustering algorithm
- dirichlet_
process - Dirichlet Process Mixture Models
- ensemble
- Ensemble Clustering Algorithms
- evolutionary
- Evolutionary and bio-inspired clustering algorithms.
- feature_
selection - Feature Selection for Clustering
- fuzzy_
cmeans - Fuzzy C-Means clustering implementation
- gmm
- Gaussian Mixture Models (GMM)
- graph_
clustering - Graph Clustering Algorithms
- hdbscan
- HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) implementation using scirs2
- hierarchical
- Hierarchical clustering implementation using scirs2
- incremental_
dbscan - Incremental DBSCAN for Streaming Data
- kde_
clustering - Kernel Density Estimation (KDE) Clustering
- kmeans
- K-Means clustering implementations
- locality_
sensitive_ hashing - Locality-Sensitive Hashing (LSH) for approximate distance computations
- lof
- Local Outlier Factor (LOF) implementation for density-based outlier detection
- mean_
shift - Mean Shift Clustering
- memory_
mapped - Memory-mapped distance matrix computation for large datasets
- multi_
view - Multi-View Clustering Algorithms
- optics
- OPTICS (Ordering Points To Identify Clustering Structure) implementation
- out_
of_ core - Out-of-Core Clustering Algorithms
- performance
- Performance optimizations for clustering algorithms
- prelude
- Prelude module for convenient imports
- rock
- ROCK (RObust Clustering using linKs) Algorithm
- semi_
supervised - Semi-Supervised Clustering Algorithms
- simd_
distances - SIMD-optimized distance computations for clustering algorithms
- sparse_
matrix - Sparse matrix representations for large-scale clustering
- spectral
- Spectral Clustering
- streaming
- Streaming Clustering Algorithms
- text_
clustering - Text and High-Dimensional Clustering Algorithms
- time_
series - Time Series Clustering Algorithms
- validation
- Comprehensive Clustering Validation Framework
Enums§
- Density
Distance Metric - Distance metric enumeration for clustering algorithms
- Linkage
Method - Linkage methods for hierarchical clustering
- Metric
- Distance metrics for hierarchical clustering