Expand description
ยงSciRS2 Cluster - Clustering Algorithms
scirs2-cluster provides comprehensive clustering algorithms for unsupervised learning, offering k-means, hierarchical clustering, DBSCAN, spectral clustering, and advanced methods with parallel processing, SIMD acceleration, and evaluation metrics.
ยง๐ฏ Key Features
- SciPy/scikit-learn Compatibility: Similar APIs to
scipy.clusterandsklearn.cluster - Partitional Clustering: K-means, K-means++, mini-batch K-means
- Hierarchical Clustering: Agglomerative with various linkage methods
- Density-based: DBSCAN, OPTICS, HDBSCAN for arbitrary-shaped clusters
- Graph-based: Spectral clustering, affinity propagation
- Evaluation Metrics: Silhouette, Davies-Bouldin, Calinski-Harabasz
- Performance: Parallel execution, SIMD distance computation
ยง๐ฆ Module Overview
| SciRS2 Module | Python Equivalent | Description |
|---|---|---|
vq | scipy.cluster.vq | K-means and vector quantization |
hierarchy | scipy.cluster.hierarchy | Hierarchical/agglomerative clustering |
dbscan | sklearn.cluster.DBSCAN | Density-based spatial clustering |
spectral | sklearn.cluster.SpectralClustering | Graph-based spectral clustering |
metrics | sklearn.metrics | Clustering evaluation metrics |
ยง๐ Quick Start
[dependencies]
scirs2-cluster = "0.1.0-rc.2"use scirs2_cluster::vq::kmeans;
use scirs2_core::ndarray::Array2;
// K-means clustering
let data = Array2::from_shape_vec((6, 2), vec![
1.0, 2.0, 1.2, 1.8, 0.8, 1.9,
3.7, 4.2, 3.9, 3.9, 4.2, 4.1,
]).unwrap();
let (centroids, labels) = kmeans(data.view(), 2, None, None, None, None).unwrap();ยง๐ Version: 0.1.0-rc.2 (October 03, 2025)
ยงFeatures
- Vector Quantization: K-means and K-means++ for partitioning data
- Hierarchical Clustering: Agglomerative clustering with various linkage methods
- Density-based Clustering: DBSCAN and OPTICS for finding clusters of arbitrary shape
- Mean Shift: Non-parametric clustering based on density estimation
- Spectral Clustering: Graph-based clustering using eigenvectors of the graph Laplacian
- Affinity Propagation: Message-passing based clustering that identifies exemplars
- Evaluation Metrics: Silhouette coefficient, Davies-Bouldin index, and other measures to evaluate clustering quality
- Data Preprocessing: Utilities for normalizing, standardizing, and whitening data before clustering
ยงExamples
use scirs2_core::ndarray::{Array2, ArrayView2};
use scirs2_cluster::vq::kmeans;
use scirs2_cluster::preprocess::standardize;
// Example data with two clusters
let data = Array2::from_shape_vec((6, 2), vec![
1.0, 2.0,
1.2, 1.8,
0.8, 1.9,
3.7, 4.2,
3.9, 3.9,
4.2, 4.1,
]).unwrap();
// Standardize the data
let standardized = standardize(data.view(), true).unwrap();
// Run k-means with k=2
let (centroids, labels) = kmeans(standardized.view(), 2, None, None, None, None).unwrap();
// Print the results
println!("Centroids: {:?}", centroids);
println!("Cluster assignments: {:?}", labels);Re-exportsยง
pub use advanced::adaptive_online_clustering;pub use advanced::deep_embedded_clustering;pub use advanced::qaoa_clustering;pub use advanced::quantum_kmeans;pub use advanced::rl_clustering;pub use advanced::transfer_learning_clustering;pub use advanced::variational_deep_embedding;pub use advanced::vqe_clustering;pub use advanced::AdaptiveOnlineClustering;pub use advanced::AdaptiveOnlineConfig;pub use advanced::DeepClusteringConfig;pub use advanced::DeepEmbeddedClustering;pub use advanced::FeatureAlignment;pub use advanced::QAOAClustering;pub use advanced::QAOAConfig;pub use advanced::QAOACostFunction;pub use advanced::QuantumConfig;pub use advanced::QuantumKMeans;pub use advanced::RLClustering;pub use advanced::RLClusteringConfig;pub use advanced::RewardFunction;pub use advanced::TransferLearningClustering;pub use advanced::TransferLearningConfig;pub use advanced::VQEAnsatz;pub use advanced::VQEClustering;pub use advanced::VQEConfig;pub use advanced::VariationalDeepEmbedding;pub use quantum_clustering::quantum_annealing_clustering;pub use quantum_clustering::CoolingSchedule;pub use quantum_clustering::QuantumAnnealingClustering;pub use quantum_clustering::QuantumAnnealingConfig;pub use advanced_clustering::AdvancedClusterer;pub use advanced_clustering::AdvancedClusteringResult;pub use advanced_clustering::AdvancedConfig;pub use advanced_clustering::AdvancedPerformanceMetrics;pub use advanced_visualization::create_advanced_visualization_report;pub use advanced_visualization::visualize_advanced_results;pub use advanced_visualization::AISelectionPlot;pub use advanced_visualization::AdvancedVisualizationConfig;pub use advanced_visualization::AdvancedVisualizationOutput;pub use advanced_visualization::AdvancedVisualizer;pub use advanced_visualization::ClusterPlot;pub use advanced_visualization::NeuromorphicAdaptationPlot;pub use advanced_visualization::PerformanceDashboard;pub use advanced_visualization::QuantumCoherencePlot;pub use advanced_visualization::QuantumColorScheme;pub use advanced_visualization::VisualizationExportFormat;pub use enhanced_clustering_features::DeepAdvancedClusterer;pub use enhanced_clustering_features::DeepAdvancedResult;pub use enhanced_clustering_features::DeepEnsembleCoordinator;pub use enhanced_clustering_features::EnsembleConsensus;pub use enhanced_clustering_features::GraphNeuralNetworkProcessor;pub use enhanced_clustering_features::GraphStructureInsights;pub use enhanced_clustering_features::NeuralArchitectureSearchEngine;pub use enhanced_clustering_features::OptimalArchitecture;pub use enhanced_clustering_features::ReinforcementLearningAgent;pub use enhanced_clustering_features::SpectralProperties;pub use enhanced_clustering_features::TransformerClusterEmbedder;pub use gpu_distributed_clustering::CommunicationOverhead;pub use gpu_distributed_clustering::CoordinationStrategy;pub use gpu_distributed_clustering::DistributedAdvancedClusterer;pub use gpu_distributed_clustering::DistributedAdvancedResult;pub use gpu_distributed_clustering::DistributedProcessingMetrics;pub use gpu_distributed_clustering::GpuAccelerationConfig;pub use gpu_distributed_clustering::GpuAccelerationMetrics;pub use gpu_distributed_clustering::GpuAdvancedClusterer;pub use gpu_distributed_clustering::GpuAdvancedResult;pub use gpu_distributed_clustering::GpuDeviceSelection;pub use gpu_distributed_clustering::GpuMemoryStrategy;pub use gpu_distributed_clustering::GpuOptimizationLevel;pub use gpu_distributed_clustering::HybridGpuDistributedClusterer;pub use gpu_distributed_clustering::HybridGpuDistributedResult;pub use gpu_distributed_clustering::LoadBalancingStats;pub use gpu_distributed_clustering::WorkerNodeConfig;pub use gpu_distributed_clustering::WorkerPerformanceStats;pub use advanced_benchmarking::create_comprehensive_report;pub use advanced_benchmarking::AdvancedBenchmark;pub use advanced_benchmarking::AlgorithmBenchmark;pub use advanced_benchmarking::AlgorithmComparison;pub use advanced_benchmarking::BenchmarkConfig;pub use advanced_benchmarking::BenchmarkResults;pub use advanced_benchmarking::ComplexityClass;pub use advanced_benchmarking::GpuVsCpuComparison;pub use advanced_benchmarking::MemoryProfile;pub use advanced_benchmarking::OptimizationCategory;pub use advanced_benchmarking::OptimizationPriority;pub use advanced_benchmarking::OptimizationSuggestion;pub use advanced_benchmarking::PerformanceStatistics;pub use advanced_benchmarking::QualityMetrics;pub use advanced_benchmarking::RegressionAlert;pub use advanced_benchmarking::RegressionSeverity;pub use advanced_benchmarking::ScalabilityAnalysis;pub use advanced_benchmarking::SystemInfo;pub use affinity::affinity_propagation;pub use affinity::AffinityPropagationOptions;pub use birch::birch;pub use birch::Birch;pub use birch::BirchOptions;pub use birch::BirchStatistics;pub use density::hdbscan::dbscan_clustering;pub use density::hdbscan::hdbscan;pub use density::hdbscan::ClusterSelectionMethod;pub use density::hdbscan::HDBSCANOptions;pub use density::hdbscan::HDBSCANResult;pub use density::hdbscan::StoreCenter;pub use density::optics::extract_dbscan_clustering;pub use density::optics::extract_xi_clusters;pub use density::optics::OPTICSResult;pub use ensemble::convenience::bootstrap_ensemble;pub use ensemble::convenience::ensemble_clustering;pub use ensemble::convenience::multi_algorithm_ensemble;pub use ensemble::ClusteringAlgorithm;pub use ensemble::ClusteringResult;pub use ensemble::ConsensusMethod;pub use ensemble::ConsensusStatistics;pub use ensemble::DiversityMetrics;pub use ensemble::DiversityStrategy;pub use ensemble::EnsembleClusterer;pub use ensemble::EnsembleConfig;pub use ensemble::EnsembleResult;pub use ensemble::NoiseType;pub use ensemble::ParameterRange;pub use ensemble::SamplingStrategy;pub use gmm::gaussian_mixture;pub use gmm::CovarianceType;pub use gmm::GMMInit;pub use gmm::GMMOptions;pub use gmm::GaussianMixture;pub use graph::girvan_newman;pub use graph::graph_clustering;pub use graph::label_propagation;pub use graph::louvain;pub use graph::Graph;pub use graph::GraphClusteringAlgorithm;pub use graph::GraphClusteringConfig;pub use input_validation::check_duplicate_points;pub use input_validation::suggest_clustering_algorithm;pub use input_validation::validate_clustering_data;pub use input_validation::validate_convergence_parameters;pub use input_validation::validate_distance_parameter;pub use input_validation::validate_integer_parameter;pub use input_validation::validate_n_clusters;pub use input_validation::validate_sample_weights;pub use input_validation::ValidationConfig;pub use leader::euclidean_distance;pub use leader::leader_clustering;pub use leader::manhattan_distance;pub use leader::LeaderClustering;pub use leader::LeaderNode;pub use leader::LeaderTree;pub use meanshift::estimate_bandwidth;pub use meanshift::get_bin_seeds;pub use meanshift::mean_shift;pub use meanshift::MeanShift;pub use meanshift::MeanShiftOptions;pub use metrics::adjusted_rand_index;pub use metrics::calinski_harabasz_score;pub use metrics::davies_bouldin_score;pub use metrics::homogeneity_completeness_v_measure;pub use metrics::normalized_mutual_info;pub use metrics::silhouette_samples;pub use metrics::silhouette_score;pub use metrics::bootstrap_confidence_interval;pub use metrics::information_theoretic::normalized_variation_of_information;pub use metrics::jensen_shannon_divergence;pub use metrics::advanced::bic_score;pub use metrics::advanced::dunn_index;pub use neighbor_search::create_neighbor_searcher;pub use neighbor_search::BallTree;pub use neighbor_search::BruteForceSearch;pub use neighbor_search::KDTree;pub use neighbor_search::NeighborResult;pub use neighbor_search::NeighborSearchAlgorithm;pub use neighbor_search::NeighborSearchConfig;pub use neighbor_search::NeighborSearcher;pub use preprocess::min_max_scale;pub use preprocess::normalize;pub use preprocess::standardize;pub use preprocess::whiten;pub use preprocess::NormType;pub use serialization::affinity_propagation_to_model;pub use serialization::birch_to_model;pub use serialization::compatibility;pub use serialization::dbscan_to_model;pub use serialization::gmm_to_model;pub use serialization::hierarchy_to_model;pub use serialization::kmeans_to_model;pub use serialization::leader_to_model;pub use serialization::leadertree_to_model;pub use serialization::meanshift_to_model;pub use serialization::save_affinity_propagation;pub use serialization::save_birch;pub use serialization::save_gmm;pub use serialization::save_hierarchy;pub use serialization::save_kmeans;pub use serialization::save_leader;pub use serialization::save_leadertree;pub use serialization::save_spectral_clustering;pub use serialization::spectral_clustering_to_model;pub use serialization::AdvancedExport;pub use serialization::AffinityPropagationModel;pub use serialization::AlgorithmState;pub use serialization::AutoSaveConfig;pub use serialization::BirchModel;pub use serialization::ClusteringWorkflow;pub use serialization::ClusteringWorkflowManager;pub use serialization::DBSCANModel;pub use serialization::DataCharacteristics;pub use serialization::EnhancedModel;pub use serialization::EnhancedModelMetadata;pub use serialization::ExportFormat;pub use serialization::GMMModel;pub use serialization::HierarchicalModel;pub use serialization::KMeansModel;pub use serialization::LeaderModel;pub use serialization::LeaderTreeModel;pub use serialization::MeanShiftModel;pub use serialization::ModelMetadata;pub use serialization::PlatformInfo;pub use serialization::SerializableModel;pub use serialization::SpectralClusteringModel;pub use serialization::TrainingMetrics;pub use serialization::TrainingStep;pub use serialization::WorkflowConfig;pub use serialization::compatibility::create_sklearn_param_grid;pub use serialization::compatibility::from_joblib_format;pub use serialization::compatibility::from_numpy_format;pub use serialization::compatibility::from_sklearn_format;pub use serialization::compatibility::generate_sklearn_model_summary;pub use serialization::compatibility::to_arrow_schema;pub use serialization::compatibility::to_huggingface_card;pub use serialization::compatibility::to_joblib_format;pub use serialization::compatibility::to_mlflow_format;pub use serialization::compatibility::to_numpy_format;pub use serialization::compatibility::to_onnx_metadata;pub use serialization::compatibility::to_pandas_clustering_report;pub use serialization::compatibility::to_pandas_format;pub use serialization::compatibility::to_pickle_like_format;pub use serialization::compatibility::to_pytorch_checkpoint;pub use serialization::compatibility::to_r_format;pub use serialization::compatibility::to_scipy_dendrogram_format;pub use serialization::compatibility::to_scipy_linkage_format;pub use serialization::compatibility::to_sklearn_clustering_result;pub use serialization::compatibility::to_sklearn_format;pub use sparse::sparse_epsilon_graph;pub use sparse::sparse_knn_graph;pub use sparse::SparseDistanceMatrix;pub use sparse::SparseHierarchicalClustering;pub use spectral::spectral_bipartition;pub use spectral::spectral_clustering;pub use spectral::AffinityMode;pub use spectral::SpectralClusteringOptions;pub use stability::BootstrapValidator;pub use stability::ConsensusClusterer;pub use stability::OptimalKSelector;pub use stability::StabilityConfig;pub use stability::StabilityResult;pub use streaming::ChunkedDistanceMatrix;pub use streaming::ProgressiveHierarchical;pub use streaming::StreamingConfig;pub use streaming::StreamingKMeans;pub use text_clustering::semantic_hierarchical;pub use text_clustering::semantic_kmeans;pub use text_clustering::topic_clustering;pub use text_clustering::SemanticClusteringConfig;pub use text_clustering::SemanticHierarchical;pub use text_clustering::SemanticKMeans;pub use text_clustering::SemanticSimilarity;pub use text_clustering::TextPreprocessing;pub use text_clustering::TextRepresentation;pub use text_clustering::TopicBasedClustering;pub use time_series::dtw_barycenter_averaging;pub use time_series::dtw_distance;pub use time_series::dtw_distance_custom;pub use time_series::dtw_hierarchical_clustering;pub use time_series::dtw_k_means;pub use time_series::dtw_k_medoids;pub use time_series::soft_dtw_distance;pub use time_series::time_series_clustering;pub use time_series::TimeSeriesAlgorithm;pub use time_series::TimeSeriesClusteringConfig;pub use tuning::AcquisitionFunction;pub use tuning::AutoTuner;pub use tuning::BayesianState;pub use tuning::CVStrategy;pub use tuning::ConvergenceInfo;pub use tuning::CrossValidationConfig;pub use tuning::EarlyStoppingConfig;pub use tuning::EnsembleResults;pub use tuning::EvaluationMetric;pub use tuning::EvaluationResult;pub use tuning::ExplorationStats;pub use tuning::HyperParameter;pub use tuning::KernelType;pub use tuning::LoadBalancingStrategy;pub use tuning::ParallelConfig;pub use tuning::ResourceConstraints;pub use tuning::SearchSpace;pub use tuning::SearchStrategy;pub use tuning::StandardSearchSpaces;pub use tuning::StoppingReason;pub use tuning::SurrogateModel;pub use tuning::TuningConfig;pub use tuning::TuningResult;pub use visualization::create_scatter_plot_2d;pub use visualization::create_scatter_plot_3d;pub use visualization::AnimationConfig;pub use visualization::BoundaryType;pub use visualization::ClusterBoundary;pub use visualization::ColorScheme;pub use visualization::DimensionalityReduction;pub use visualization::EasingFunction;pub use visualization::LegendEntry;pub use visualization::ScatterPlot2D;pub use visualization::ScatterPlot3D;pub use visualization::VisualizationConfig;pub use visualization::animation::AnimationFrame;pub use visualization::animation::IterativeAnimationConfig;pub use visualization::animation::IterativeAnimationRecorder;pub use visualization::animation::StreamingVisualizer;pub use visualization::interactive::ClusterStats;pub use visualization::interactive::InteractiveConfig;pub use visualization::interactive::InteractiveState;pub use visualization::interactive::InteractiveVisualizer;pub use visualization::export::export_scatter_2d_to_html;pub use visualization::export::export_scatter_2d_to_json;pub use visualization::export::export_scatter_3d_to_html;pub use visualization::export::export_scatter_3d_to_json;pub use visualization::export::save_visualization_to_file;pub use distributed::DataPartition;pub use distributed::DistributedKMeans;pub use distributed::DistributedKMeansConfig;pub use distributed::PartitioningStrategy;pub use distributed::WorkerStatus;pub use density::*;pub use hierarchy::*;pub use vq::*;
Modulesยง
- advanced
- Cutting-edge clustering algorithms including quantum-inspired methods and advanced online learning.
- advanced_
benchmarking - Advanced benchmarking and performance profiling system.
- advanced_
clustering - Advanced Clustering - AI-Driven Quantum-Neuromorphic Clustering
- advanced_
visualization - Enhanced visualization specifically for advanced clustering results.
- affinity
- Affinity Propagation clustering implementation
- birch
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) clustering algorithm
- density
- Density-based clustering algorithms
- distributed
- Distributed clustering algorithms for large-scale datasets.
- enhanced_
clustering_ features - Enhanced Advanced Features - Advanced AI-Driven Clustering Extensions
- ensemble
- Ensemble clustering methods for improved robustness.
- error
- Error types for the clustering module
- gmm
- Gaussian Mixture Models (GMM) for clustering
- gpu_
distributed_ clustering - Advanced GPU and Distributed Computing Extensions
- graph
- Graph clustering and community detection algorithms.
- hierarchy
- Hierarchical clustering functions
- input_
validation - Enhanced input validation utilities
- leader
- Leader algorithm implementation for clustering
- meanshift
- Mean Shift clustering implementation.
- metrics
- Clustering evaluation metrics
- neighbor_
search - Efficient neighbor search algorithms for clustering
- preprocess
- Data preprocessing utilities for clustering algorithms
- quantum_
clustering - Quantum-Inspired Clustering Algorithms
- serialization
- Model serialization and deserialization
- sparse
- Sparse distance matrix support for large datasets
- spectral
- Spectral clustering implementation
- stability
- Cluster stability assessment tools
- streaming
- Streaming and memory-efficient clustering algorithms
- text_
clustering - Text clustering algorithms with semantic similarity support.
- time_
series - Time series clustering algorithms with specialized distance metrics.
- tuning
- Automatic hyperparameter tuning for clustering algorithms.
- utils
- Utility modules for clustering algorithms
- visualization
- Enhanced visualization capabilities for clustering results.
- vq
- Vector quantization functions