Crate scirs2_cluster

Crate scirs2_cluster 

Source
Expand description

ยงSciRS2 Cluster - Clustering Algorithms

scirs2-cluster provides comprehensive clustering algorithms for unsupervised learning, offering k-means, hierarchical clustering, DBSCAN, spectral clustering, and advanced methods with parallel processing, SIMD acceleration, and evaluation metrics.

ยง๐ŸŽฏ Key Features

  • SciPy/scikit-learn Compatibility: Similar APIs to scipy.cluster and sklearn.cluster
  • Partitional Clustering: K-means, K-means++, mini-batch K-means
  • Hierarchical Clustering: Agglomerative with various linkage methods
  • Density-based: DBSCAN, OPTICS, HDBSCAN for arbitrary-shaped clusters
  • Graph-based: Spectral clustering, affinity propagation
  • Evaluation Metrics: Silhouette, Davies-Bouldin, Calinski-Harabasz
  • Performance: Parallel execution, SIMD distance computation

ยง๐Ÿ“ฆ Module Overview

SciRS2 ModulePython EquivalentDescription
vqscipy.cluster.vqK-means and vector quantization
hierarchyscipy.cluster.hierarchyHierarchical/agglomerative clustering
dbscansklearn.cluster.DBSCANDensity-based spatial clustering
spectralsklearn.cluster.SpectralClusteringGraph-based spectral clustering
metricssklearn.metricsClustering evaluation metrics

ยง๐Ÿš€ Quick Start

[dependencies]
scirs2-cluster = "0.1.0-rc.2"
use scirs2_cluster::vq::kmeans;
use scirs2_core::ndarray::Array2;

// K-means clustering
let data = Array2::from_shape_vec((6, 2), vec![
    1.0, 2.0, 1.2, 1.8, 0.8, 1.9,
    3.7, 4.2, 3.9, 3.9, 4.2, 4.1,
]).unwrap();

let (centroids, labels) = kmeans(data.view(), 2, None, None, None, None).unwrap();

ยง๐Ÿ”’ Version: 0.1.0-rc.2 (October 03, 2025)

ยงFeatures

  • Vector Quantization: K-means and K-means++ for partitioning data
  • Hierarchical Clustering: Agglomerative clustering with various linkage methods
  • Density-based Clustering: DBSCAN and OPTICS for finding clusters of arbitrary shape
  • Mean Shift: Non-parametric clustering based on density estimation
  • Spectral Clustering: Graph-based clustering using eigenvectors of the graph Laplacian
  • Affinity Propagation: Message-passing based clustering that identifies exemplars
  • Evaluation Metrics: Silhouette coefficient, Davies-Bouldin index, and other measures to evaluate clustering quality
  • Data Preprocessing: Utilities for normalizing, standardizing, and whitening data before clustering

ยงExamples

use scirs2_core::ndarray::{Array2, ArrayView2};
use scirs2_cluster::vq::kmeans;
use scirs2_cluster::preprocess::standardize;

// Example data with two clusters
let data = Array2::from_shape_vec((6, 2), vec![
    1.0, 2.0,
    1.2, 1.8,
    0.8, 1.9,
    3.7, 4.2,
    3.9, 3.9,
    4.2, 4.1,
]).unwrap();

// Standardize the data
let standardized = standardize(data.view(), true).unwrap();

// Run k-means with k=2
let (centroids, labels) = kmeans(standardized.view(), 2, None, None, None, None).unwrap();

// Print the results
println!("Centroids: {:?}", centroids);
println!("Cluster assignments: {:?}", labels);

Re-exportsยง

pub use advanced::adaptive_online_clustering;
pub use advanced::deep_embedded_clustering;
pub use advanced::qaoa_clustering;
pub use advanced::quantum_kmeans;
pub use advanced::rl_clustering;
pub use advanced::transfer_learning_clustering;
pub use advanced::variational_deep_embedding;
pub use advanced::vqe_clustering;
pub use advanced::AdaptiveOnlineClustering;
pub use advanced::AdaptiveOnlineConfig;
pub use advanced::DeepClusteringConfig;
pub use advanced::DeepEmbeddedClustering;
pub use advanced::FeatureAlignment;
pub use advanced::QAOAClustering;
pub use advanced::QAOAConfig;
pub use advanced::QAOACostFunction;
pub use advanced::QuantumConfig;
pub use advanced::QuantumKMeans;
pub use advanced::RLClustering;
pub use advanced::RLClusteringConfig;
pub use advanced::RewardFunction;
pub use advanced::TransferLearningClustering;
pub use advanced::TransferLearningConfig;
pub use advanced::VQEAnsatz;
pub use advanced::VQEClustering;
pub use advanced::VQEConfig;
pub use advanced::VariationalDeepEmbedding;
pub use quantum_clustering::quantum_annealing_clustering;
pub use quantum_clustering::CoolingSchedule;
pub use quantum_clustering::QuantumAnnealingClustering;
pub use quantum_clustering::QuantumAnnealingConfig;
pub use advanced_clustering::AdvancedClusterer;
pub use advanced_clustering::AdvancedClusteringResult;
pub use advanced_clustering::AdvancedConfig;
pub use advanced_clustering::AdvancedPerformanceMetrics;
pub use advanced_visualization::create_advanced_visualization_report;
pub use advanced_visualization::visualize_advanced_results;
pub use advanced_visualization::AISelectionPlot;
pub use advanced_visualization::AdvancedVisualizationConfig;
pub use advanced_visualization::AdvancedVisualizationOutput;
pub use advanced_visualization::AdvancedVisualizer;
pub use advanced_visualization::ClusterPlot;
pub use advanced_visualization::NeuromorphicAdaptationPlot;
pub use advanced_visualization::PerformanceDashboard;
pub use advanced_visualization::QuantumCoherencePlot;
pub use advanced_visualization::QuantumColorScheme;
pub use advanced_visualization::VisualizationExportFormat;
pub use enhanced_clustering_features::DeepAdvancedClusterer;
pub use enhanced_clustering_features::DeepAdvancedResult;
pub use enhanced_clustering_features::DeepEnsembleCoordinator;
pub use enhanced_clustering_features::EnsembleConsensus;
pub use enhanced_clustering_features::GraphNeuralNetworkProcessor;
pub use enhanced_clustering_features::GraphStructureInsights;
pub use enhanced_clustering_features::NeuralArchitectureSearchEngine;
pub use enhanced_clustering_features::OptimalArchitecture;
pub use enhanced_clustering_features::ReinforcementLearningAgent;
pub use enhanced_clustering_features::SpectralProperties;
pub use enhanced_clustering_features::TransformerClusterEmbedder;
pub use gpu_distributed_clustering::CommunicationOverhead;
pub use gpu_distributed_clustering::CoordinationStrategy;
pub use gpu_distributed_clustering::DistributedAdvancedClusterer;
pub use gpu_distributed_clustering::DistributedAdvancedResult;
pub use gpu_distributed_clustering::DistributedProcessingMetrics;
pub use gpu_distributed_clustering::GpuAccelerationConfig;
pub use gpu_distributed_clustering::GpuAccelerationMetrics;
pub use gpu_distributed_clustering::GpuAdvancedClusterer;
pub use gpu_distributed_clustering::GpuAdvancedResult;
pub use gpu_distributed_clustering::GpuDeviceSelection;
pub use gpu_distributed_clustering::GpuMemoryStrategy;
pub use gpu_distributed_clustering::GpuOptimizationLevel;
pub use gpu_distributed_clustering::HybridGpuDistributedClusterer;
pub use gpu_distributed_clustering::HybridGpuDistributedResult;
pub use gpu_distributed_clustering::LoadBalancingStats;
pub use gpu_distributed_clustering::WorkerNodeConfig;
pub use gpu_distributed_clustering::WorkerPerformanceStats;
pub use advanced_benchmarking::create_comprehensive_report;
pub use advanced_benchmarking::AdvancedBenchmark;
pub use advanced_benchmarking::AlgorithmBenchmark;
pub use advanced_benchmarking::AlgorithmComparison;
pub use advanced_benchmarking::BenchmarkConfig;
pub use advanced_benchmarking::BenchmarkResults;
pub use advanced_benchmarking::ComplexityClass;
pub use advanced_benchmarking::GpuVsCpuComparison;
pub use advanced_benchmarking::MemoryProfile;
pub use advanced_benchmarking::OptimizationCategory;
pub use advanced_benchmarking::OptimizationPriority;
pub use advanced_benchmarking::OptimizationSuggestion;
pub use advanced_benchmarking::PerformanceStatistics;
pub use advanced_benchmarking::QualityMetrics;
pub use advanced_benchmarking::RegressionAlert;
pub use advanced_benchmarking::RegressionSeverity;
pub use advanced_benchmarking::ScalabilityAnalysis;
pub use advanced_benchmarking::SystemInfo;
pub use affinity::affinity_propagation;
pub use affinity::AffinityPropagationOptions;
pub use birch::birch;
pub use birch::Birch;
pub use birch::BirchOptions;
pub use birch::BirchStatistics;
pub use density::hdbscan::dbscan_clustering;
pub use density::hdbscan::hdbscan;
pub use density::hdbscan::ClusterSelectionMethod;
pub use density::hdbscan::HDBSCANOptions;
pub use density::hdbscan::HDBSCANResult;
pub use density::hdbscan::StoreCenter;
pub use density::optics::extract_dbscan_clustering;
pub use density::optics::extract_xi_clusters;
pub use density::optics::OPTICSResult;
pub use ensemble::convenience::bootstrap_ensemble;
pub use ensemble::convenience::ensemble_clustering;
pub use ensemble::convenience::multi_algorithm_ensemble;
pub use ensemble::ClusteringAlgorithm;
pub use ensemble::ClusteringResult;
pub use ensemble::ConsensusMethod;
pub use ensemble::ConsensusStatistics;
pub use ensemble::DiversityMetrics;
pub use ensemble::DiversityStrategy;
pub use ensemble::EnsembleClusterer;
pub use ensemble::EnsembleConfig;
pub use ensemble::EnsembleResult;
pub use ensemble::NoiseType;
pub use ensemble::ParameterRange;
pub use ensemble::SamplingStrategy;
pub use gmm::gaussian_mixture;
pub use gmm::CovarianceType;
pub use gmm::GMMInit;
pub use gmm::GMMOptions;
pub use gmm::GaussianMixture;
pub use graph::girvan_newman;
pub use graph::graph_clustering;
pub use graph::label_propagation;
pub use graph::louvain;
pub use graph::Graph;
pub use graph::GraphClusteringAlgorithm;
pub use graph::GraphClusteringConfig;
pub use input_validation::check_duplicate_points;
pub use input_validation::suggest_clustering_algorithm;
pub use input_validation::validate_clustering_data;
pub use input_validation::validate_convergence_parameters;
pub use input_validation::validate_distance_parameter;
pub use input_validation::validate_integer_parameter;
pub use input_validation::validate_n_clusters;
pub use input_validation::validate_sample_weights;
pub use input_validation::ValidationConfig;
pub use leader::euclidean_distance;
pub use leader::leader_clustering;
pub use leader::manhattan_distance;
pub use leader::LeaderClustering;
pub use leader::LeaderNode;
pub use leader::LeaderTree;
pub use meanshift::estimate_bandwidth;
pub use meanshift::get_bin_seeds;
pub use meanshift::mean_shift;
pub use meanshift::MeanShift;
pub use meanshift::MeanShiftOptions;
pub use metrics::adjusted_rand_index;
pub use metrics::calinski_harabasz_score;
pub use metrics::davies_bouldin_score;
pub use metrics::homogeneity_completeness_v_measure;
pub use metrics::normalized_mutual_info;
pub use metrics::silhouette_samples;
pub use metrics::silhouette_score;
pub use metrics::bootstrap_confidence_interval;
pub use metrics::information_theoretic::normalized_variation_of_information;
pub use metrics::jensen_shannon_divergence;
pub use metrics::advanced::bic_score;
pub use metrics::advanced::dunn_index;
pub use neighbor_search::create_neighbor_searcher;
pub use neighbor_search::BallTree;
pub use neighbor_search::BruteForceSearch;
pub use neighbor_search::KDTree;
pub use neighbor_search::NeighborResult;
pub use neighbor_search::NeighborSearchAlgorithm;
pub use neighbor_search::NeighborSearchConfig;
pub use neighbor_search::NeighborSearcher;
pub use preprocess::min_max_scale;
pub use preprocess::normalize;
pub use preprocess::standardize;
pub use preprocess::whiten;
pub use preprocess::NormType;
pub use serialization::affinity_propagation_to_model;
pub use serialization::birch_to_model;
pub use serialization::compatibility;
pub use serialization::dbscan_to_model;
pub use serialization::gmm_to_model;
pub use serialization::hierarchy_to_model;
pub use serialization::kmeans_to_model;
pub use serialization::leader_to_model;
pub use serialization::leadertree_to_model;
pub use serialization::meanshift_to_model;
pub use serialization::save_affinity_propagation;
pub use serialization::save_birch;
pub use serialization::save_gmm;
pub use serialization::save_hierarchy;
pub use serialization::save_kmeans;
pub use serialization::save_leader;
pub use serialization::save_leadertree;
pub use serialization::save_spectral_clustering;
pub use serialization::spectral_clustering_to_model;
pub use serialization::AdvancedExport;
pub use serialization::AffinityPropagationModel;
pub use serialization::AlgorithmState;
pub use serialization::AutoSaveConfig;
pub use serialization::BirchModel;
pub use serialization::ClusteringWorkflow;
pub use serialization::ClusteringWorkflowManager;
pub use serialization::DBSCANModel;
pub use serialization::DataCharacteristics;
pub use serialization::EnhancedModel;
pub use serialization::EnhancedModelMetadata;
pub use serialization::ExportFormat;
pub use serialization::GMMModel;
pub use serialization::HierarchicalModel;
pub use serialization::KMeansModel;
pub use serialization::LeaderModel;
pub use serialization::LeaderTreeModel;
pub use serialization::MeanShiftModel;
pub use serialization::ModelMetadata;
pub use serialization::PlatformInfo;
pub use serialization::SerializableModel;
pub use serialization::SpectralClusteringModel;
pub use serialization::TrainingMetrics;
pub use serialization::TrainingStep;
pub use serialization::WorkflowConfig;
pub use serialization::compatibility::create_sklearn_param_grid;
pub use serialization::compatibility::from_joblib_format;
pub use serialization::compatibility::from_numpy_format;
pub use serialization::compatibility::from_sklearn_format;
pub use serialization::compatibility::generate_sklearn_model_summary;
pub use serialization::compatibility::to_arrow_schema;
pub use serialization::compatibility::to_huggingface_card;
pub use serialization::compatibility::to_joblib_format;
pub use serialization::compatibility::to_mlflow_format;
pub use serialization::compatibility::to_numpy_format;
pub use serialization::compatibility::to_onnx_metadata;
pub use serialization::compatibility::to_pandas_clustering_report;
pub use serialization::compatibility::to_pandas_format;
pub use serialization::compatibility::to_pickle_like_format;
pub use serialization::compatibility::to_pytorch_checkpoint;
pub use serialization::compatibility::to_r_format;
pub use serialization::compatibility::to_scipy_dendrogram_format;
pub use serialization::compatibility::to_scipy_linkage_format;
pub use serialization::compatibility::to_sklearn_clustering_result;
pub use serialization::compatibility::to_sklearn_format;
pub use sparse::sparse_epsilon_graph;
pub use sparse::sparse_knn_graph;
pub use sparse::SparseDistanceMatrix;
pub use sparse::SparseHierarchicalClustering;
pub use spectral::spectral_bipartition;
pub use spectral::spectral_clustering;
pub use spectral::AffinityMode;
pub use spectral::SpectralClusteringOptions;
pub use stability::BootstrapValidator;
pub use stability::ConsensusClusterer;
pub use stability::OptimalKSelector;
pub use stability::StabilityConfig;
pub use stability::StabilityResult;
pub use streaming::ChunkedDistanceMatrix;
pub use streaming::ProgressiveHierarchical;
pub use streaming::StreamingConfig;
pub use streaming::StreamingKMeans;
pub use text_clustering::semantic_hierarchical;
pub use text_clustering::semantic_kmeans;
pub use text_clustering::topic_clustering;
pub use text_clustering::SemanticClusteringConfig;
pub use text_clustering::SemanticHierarchical;
pub use text_clustering::SemanticKMeans;
pub use text_clustering::SemanticSimilarity;
pub use text_clustering::TextPreprocessing;
pub use text_clustering::TextRepresentation;
pub use text_clustering::TopicBasedClustering;
pub use time_series::dtw_barycenter_averaging;
pub use time_series::dtw_distance;
pub use time_series::dtw_distance_custom;
pub use time_series::dtw_hierarchical_clustering;
pub use time_series::dtw_k_means;
pub use time_series::dtw_k_medoids;
pub use time_series::soft_dtw_distance;
pub use time_series::time_series_clustering;
pub use time_series::TimeSeriesAlgorithm;
pub use time_series::TimeSeriesClusteringConfig;
pub use tuning::AcquisitionFunction;
pub use tuning::AutoTuner;
pub use tuning::BayesianState;
pub use tuning::CVStrategy;
pub use tuning::ConvergenceInfo;
pub use tuning::CrossValidationConfig;
pub use tuning::EarlyStoppingConfig;
pub use tuning::EnsembleResults;
pub use tuning::EvaluationMetric;
pub use tuning::EvaluationResult;
pub use tuning::ExplorationStats;
pub use tuning::HyperParameter;
pub use tuning::KernelType;
pub use tuning::LoadBalancingStrategy;
pub use tuning::ParallelConfig;
pub use tuning::ResourceConstraints;
pub use tuning::SearchSpace;
pub use tuning::SearchStrategy;
pub use tuning::StandardSearchSpaces;
pub use tuning::StoppingReason;
pub use tuning::SurrogateModel;
pub use tuning::TuningConfig;
pub use tuning::TuningResult;
pub use visualization::create_scatter_plot_2d;
pub use visualization::create_scatter_plot_3d;
pub use visualization::AnimationConfig;
pub use visualization::BoundaryType;
pub use visualization::ClusterBoundary;
pub use visualization::ColorScheme;
pub use visualization::DimensionalityReduction;
pub use visualization::EasingFunction;
pub use visualization::LegendEntry;
pub use visualization::ScatterPlot2D;
pub use visualization::ScatterPlot3D;
pub use visualization::VisualizationConfig;
pub use visualization::animation::AnimationFrame;
pub use visualization::animation::IterativeAnimationConfig;
pub use visualization::animation::IterativeAnimationRecorder;
pub use visualization::animation::StreamingVisualizer;
pub use visualization::interactive::ClusterStats;
pub use visualization::interactive::InteractiveConfig;
pub use visualization::interactive::InteractiveState;
pub use visualization::interactive::InteractiveVisualizer;
pub use visualization::export::export_scatter_2d_to_html;
pub use visualization::export::export_scatter_2d_to_json;
pub use visualization::export::export_scatter_3d_to_html;
pub use visualization::export::export_scatter_3d_to_json;
pub use visualization::export::save_visualization_to_file;
pub use distributed::DataPartition;
pub use distributed::DistributedKMeans;
pub use distributed::DistributedKMeansConfig;
pub use distributed::PartitioningStrategy;
pub use distributed::WorkerStatus;
pub use density::*;
pub use hierarchy::*;
pub use vq::*;

Modulesยง

advanced
Cutting-edge clustering algorithms including quantum-inspired methods and advanced online learning.
advanced_benchmarking
Advanced benchmarking and performance profiling system.
advanced_clustering
Advanced Clustering - AI-Driven Quantum-Neuromorphic Clustering
advanced_visualization
Enhanced visualization specifically for advanced clustering results.
affinity
Affinity Propagation clustering implementation
birch
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) clustering algorithm
density
Density-based clustering algorithms
distributed
Distributed clustering algorithms for large-scale datasets.
enhanced_clustering_features
Enhanced Advanced Features - Advanced AI-Driven Clustering Extensions
ensemble
Ensemble clustering methods for improved robustness.
error
Error types for the clustering module
gmm
Gaussian Mixture Models (GMM) for clustering
gpu_distributed_clustering
Advanced GPU and Distributed Computing Extensions
graph
Graph clustering and community detection algorithms.
hierarchy
Hierarchical clustering functions
input_validation
Enhanced input validation utilities
leader
Leader algorithm implementation for clustering
meanshift
Mean Shift clustering implementation.
metrics
Clustering evaluation metrics
neighbor_search
Efficient neighbor search algorithms for clustering
preprocess
Data preprocessing utilities for clustering algorithms
quantum_clustering
Quantum-Inspired Clustering Algorithms
serialization
Model serialization and deserialization
sparse
Sparse distance matrix support for large datasets
spectral
Spectral clustering implementation
stability
Cluster stability assessment tools
streaming
Streaming and memory-efficient clustering algorithms
text_clustering
Text clustering algorithms with semantic similarity support.
time_series
Time series clustering algorithms with specialized distance metrics.
tuning
Automatic hyperparameter tuning for clustering algorithms.
utils
Utility modules for clustering algorithms
visualization
Enhanced visualization capabilities for clustering results.
vq
Vector quantization functions