A high-performance Rust implementation of K-Means clustering algorithms. Kentro provides both standard and advanced K-Means variants with parallel processing support.
π Features
- Standard K-Means: Classic Lloyd's algorithm implementation
- Spherical K-Means: Uses cosine similarity instead of Euclidean distance
- Balanced K-Means: Ensures clusters have similar sizes using efficient balancing algorithms
- Parallel Processing: Multi-threaded execution using Rayon
- Flexible API: Builder pattern for easy configuration
- Memory Efficient: Optimized for large datasets
- Comprehensive Error Handling: Detailed error types and messages
π¦ Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
= "0.15"
π§ Quick Start
use KMeans;
use Array2;
// Create sample data (100 points, 2 dimensions)
let data = from_shape_vec.unwrap;
// Create and configure K-Means
let mut kmeans = new
.with_iterations
.with_verbose;
// Train the model
let clusters = kmeans.train.unwrap;
println!;
for in clusters.iter.enumerate
π API Reference
Creating a K-Means Instance
use KMeans;
let kmeans = new // 3 clusters
.with_iterations // 50 iterations (default: 25)
.with_euclidean // Use Euclidean distance (default: false)
.with_balanced // Enable balanced clustering (default: false)
.with_max_balance_diff // Max cluster size difference (default: 16)
.with_verbose; // Enable verbose output (default: false)
Training
// Train on your data
let clusters = kmeans.train?; // Use 4 threads
// Returns Vec<Vec<usize>> where each inner vector contains
// the indices of points assigned to that cluster
Assignment
// Assign new points to existing clusters
let assignments = kmeans.assign?; // k=1 (nearest cluster)
// Multi-assignment (assign to k nearest clusters)
let multi_assignments = kmeans.assign?; // k=2
Accessing Results
// Get centroids
if let Some = kmeans.centroids
// Check model state
println!;
println!;
println!;
println!;
π― Algorithm Variants
Standard K-Means
Uses cosine similarity (inner product) as the distance metric. Suitable for high-dimensional data and text clustering.
let mut kmeans = new;
Euclidean K-Means
Uses Euclidean distance as the distance metric. Better for geometric data.
let mut kmeans = new.with_euclidean;
Balanced K-Means
Ensures clusters have similar sizes using the algorithm from:
Rieke de Maeyer, Sami Sieranoja, and Pasi FrΓ€nti. "Balanced k-means revisited." Applied Computing and Intelligence, 3(2):145β179, 2023.
let mut kmeans = new
.with_balanced
.with_max_balance_diff;
β‘ Performance Features
Parallel Processing
Kentro automatically uses all available CPU cores for cluster assignment. You can control the number of threads:
// Use 4 threads
let clusters = kmeans.train?;
// Use all available cores (default)
let clusters = kmeans.train?;
Memory Efficiency
- Uses
ndarrayfor efficient matrix operations - Minimal memory allocations during iterations
- Supports large datasets through optimized algorithms
π Examples
Basic Clustering
use KMeans;
use Array2;
Balanced Clustering
Text Clustering (Spherical K-Means)
π οΈ Error Handling
Kentro provides comprehensive error handling:
use ;
match kmeans.train
π§ͺ Testing
Run the test suite:
Run with verbose output:
π Running Examples
# Run the main example
# Run with release optimizations
π Benchmarks
Kentro is designed for high performance:
- Parallel Processing: Scales with CPU cores
- Memory Efficient: Minimal allocations during clustering
- Optimized Algorithms: Based on proven efficient implementations
π License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.