# ๐ฏ avx Clustering
**State-of-the-art clustering algorithms for Rust** - surpassing scikit-learn, HDBSCAN, and RAPIDS cuML
[](https://crates.io/crates/avx-clustering)
[](https://docs.rs/avx-clustering)
[](LICENSE)
Pure Rust implementations of advanced clustering algorithms with **GPU acceleration**, **parallel processing**, and **scientific features**.
## ๐ Features
### Core Algorithms
- โ
**K-Means** - Lloyd's algorithm with K-Means++ init, Mini-Batch variant
- โ
**DBSCAN** - Density-based spatial clustering with KD-tree optimization
- โ
**HDBSCAN** - Hierarchical DBSCAN with noise handling
- โ
**OPTICS** - Ordering points for cluster structure
- โ
**Affinity Propagation** - Message passing based clustering
- โ
**Mean Shift** - Non-parametric feature-space analysis
- โ
**Spectral Clustering** - Graph-based clustering with eigenvector decomposition
- โ
**Agglomerative** - Hierarchical clustering (linkage methods)
- โ
**Ensemble Clustering** - Consensus clustering for robustness
### Advanced Features
- โ
**GPU Acceleration** - CUDA & WGPU support for massive speedups
- โ
**Parallel Processing** - Multi-threaded via Rayon
- โ
**Time Series Clustering** - DTW distance, shape-based clustering
- โ
**Text Clustering** - TF-IDF vectorization, cosine similarity
- โ
**Scientific** - Astronomy (galaxy clustering), Physics (particle clustering), Spacetime (4D tensor clustering)
- โ
**Incremental Learning** - Online clustering with streaming data
- โ
**Auto-tuning** - Hyperparameter optimization
## ๐ฆ Installation
```toml
[dependencies]
avx-clustering = "0.1"
```
### Feature Flags
```toml
[dependencies]
avx-clustering = { version = "0.1", features = ["gpu"] }
```
**Available features:**
- `gpu` - CUDA GPU acceleration
- `gpu-wgpu` - WGPU cross-platform GPU support
- `full` - All features enabled
## ๐ฏ Quick Start
### K-Means Clustering
```rust
use avx_clustering::prelude::*;
use ndarray::array;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create sample data
let data = array![
[1.0, 2.0],
[1.5, 1.8],
[5.0, 8.0],
[8.0, 8.0],
[1.0, 0.6],
[9.0, 11.0],
];
// Fit K-Means with 2 clusters
let kmeans = KMeansBuilder::new(2)
.max_iter(100)
.tolerance(1e-4)
.fit(data.view())?;
println!("Labels: {:?}", kmeans.labels);
println!("Centroids:\n{}", kmeans.centroids);
// Predict new points
let new_data = array![[0.0, 0.0], [10.0, 10.0]];
let predictions = kmeans.predict(new_data.view())?;
println!("Predictions: {:?}", predictions);
Ok(())
}
```
### DBSCAN - Density-Based Clustering
```rust
use avx_clustering::prelude::*;
let data = array![
[1.0, 2.0],
[2.0, 2.0],
[2.0, 3.0],
[8.0, 7.0],
[8.0, 8.0],
[25.0, 80.0], // Noise point
];
let dbscan = DBSCANBuilder::new()
.eps(3.0)
.min_samples(2)
.fit(data.view())?;
println!("Labels: {:?}", dbscan.labels); // -1 indicates noise
println!("Core samples: {:?}", dbscan.core_sample_indices);
```
### HDBSCAN - Hierarchical DBSCAN
```rust
use avx_clustering::prelude::*;
let data = generate_blobs(1000, 5, 2.0)?;
let hdbscan = HDBSCANBuilder::new()
.min_cluster_size(50)
.min_samples(5)
.fit(data.view())?;
println!("Number of clusters: {}", hdbscan.n_clusters());
println!("Outlier scores: {:?}", &hdbscan.outlier_scores[..10]);
```
### Spectral Clustering
```rust
use avx_clustering::prelude::*;
let data = generate_moons(300, 0.1)?; // Two interleaving half circles
let spectral = SpectralClusteringBuilder::new(2)
.n_neighbors(10)
.fit(data.view())?;
println!("Labels: {:?}", spectral.labels);
```
### Affinity Propagation
```rust
use avx_clustering::prelude::*;
let data = array![
[0.0, 0.0],
[0.1, 0.1],
[5.0, 5.0],
[5.1, 5.1],
];
let ap = AffinityPropagationBuilder::new()
.damping(0.5)
.max_iter(200)
.fit(data.view())?;
println!("Exemplars: {}", ap.cluster_centers);
println!("Number of clusters: {}", ap.n_clusters);
```
### Ensemble Clustering
```rust
use avx_clustering::prelude::*;
let data = generate_blobs(500, 3, 1.0)?;
let ensemble = EnsembleClusteringBuilder::new(3)
.n_iterations(20)
.subsample_ratio(0.8)
.fit(data.view())?;
println!("Stability score: {:.3}", ensemble.stability_score());
println!("Labels: {:?}", &ensemble.labels[..10]);
```
### Time Series Clustering
```rust
use avx_clustering::prelude::*;
// Create time series data (n_series x n_timepoints)
let ts_data = array![
[1.0, 2.0, 3.0, 4.0, 5.0],
[1.1, 2.1, 3.1, 4.1, 5.1],
[10.0, 9.0, 8.0, 7.0, 6.0],
];
let ts_kmeans = TimeSeriesKMeansBuilder::new(2)
.distance_metric(TimeSeriesDistance::DTW)
.fit(ts_data.view())?;
println!("Time series clusters: {:?}", ts_kmeans.labels);
```
### Text Clustering
```rust
use avx_clustering::prelude::*;
let documents = vec![
"machine learning algorithms",
"deep neural networks",
"clustering data points",
"supervised learning models",
];
let text_cluster = TextClusteringBuilder::new(2)
.max_features(100)
.fit(&documents)?;
println!("Document clusters: {:?}", text_cluster.labels);
```
## ๐ Performance Benchmarks
**Hardware:** AMD Ryzen 9 5950X, RTX 3090
| K-Means | 1M points | 1.2s | 0.08s | **15x** |
| DBSCAN | 100K points | 2.5s | 0.18s | **13.9x** |
| HDBSCAN | 100K points | 4.8s | 0.35s | **13.7x** |
| Spectral | 10K points | 3.2s | 0.25s | **12.8x** |
**Comparison with Other Libraries (100K points, K-Means):**
| **avx** | Rust | **1.2s** | **78 MB** |
| scikit-learn | Python | 3.8s | 420 MB |
| RAPIDS cuML | Python+CUDA | 1.5s | 650 MB |
| Julia Clustering | Julia | 2.1s | 180 MB |
## ๐ Examples
### Galaxy Clustering (Astronomy)
```rust
use avx_clustering::scientific::astronomy::*;
// Load astronomical data (RA, Dec, redshift)
let galaxies = load_sdss_data("galaxies.csv")?;
let galaxy_clusters = GalaxyClusteringBuilder::new()
.min_members(10)
.max_radius_mpc(2.0)
.fit(galaxies.view())?;
println!("Found {} galaxy clusters", galaxy_clusters.n_clusters());
```
### Particle Clustering (Physics)
```rust
use avx_clustering::scientific::physics::*;
// Particle collision data (px, py, pz, energy)
let particles = simulate_collision()?;
let jets = ParticleClusteringBuilder::new()
.algorithm(JetAlgorithm::AntiKt)
.radius_parameter(0.4)
.fit(particles.view())?;
println!("Reconstructed {} jets", jets.n_clusters());
```
### Incremental Clustering (Streaming Data)
```rust
use avx_clustering::prelude::*;
let mut incremental = IncrementalKMeans::new(3);
// Process data in batches
for batch in data_stream.chunks(100) {
incremental.partial_fit(batch.view())?;
}
println!("Final centroids:\n{}", incremental.centroids);
```
## ๐ฌ Advanced Usage
### GPU Acceleration
```rust
use avx_clustering::gpu::*;
#[cfg(feature = "gpu")]
{
let data = generate_large_dataset(10_000_000)?;
let kmeans_gpu = KMeansGPU::new(10)
.fit(data.view())?;
println!("GPU clustering complete: {} clusters", kmeans_gpu.n_clusters);
}
```
### Auto-Tuning
```rust
use avx_clustering::prelude::*;
let data = generate_complex_data()?;
// Automatically find best number of clusters
let optimal = auto_tune_kmeans(data.view(), 2..=10)?;
println!("Optimal k: {}", optimal.k);
println!("Silhouette score: {:.3}", optimal.score);
```
### Custom Distance Metrics
```rust
use avx_clustering::metrics::*;
fn custom_distance(a: &[f64], b: &[f64]) -> f64 {
// Your custom distance function
a.iter().zip(b.iter())
.map(|(x, y)| (x - y).abs())
.sum()
}
let dbscan = DBSCANBuilder::new()
.eps(3.0)
.min_samples(5)
.distance_fn(custom_distance)
.fit(data.view())?;
```
## ๐งช Testing
```bash
# Run all tests
cargo test
# Run with all features
cargo test --all-features
# Run benchmarks
cargo bench
# Run specific algorithm tests
cargo test --test kmeans
cargo test --test dbscan
```
## ๐ Benchmarks
```bash
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench kmeans_bench
# With GPU
cargo bench --features gpu --bench gpu_benchmarks
```
## ๐๏ธ Architecture
```
avx-clustering/
โโโ algorithms/ # Core clustering algorithms
โ โโโ kmeans.rs
โ โโโ dbscan.rs
โ โโโ hdbscan.rs
โ โโโ optics.rs
โ โโโ affinity_propagation.rs
โ โโโ mean_shift.rs
โ โโโ spectral.rs
โ โโโ agglomerative.rs
โ โโโ ensemble.rs
โ โโโ text.rs
โ โโโ timeseries.rs
โโโ gpu/ # GPU implementations
โ โโโ kmeans_gpu.rs
โ โโโ dbscan_gpu.rs
โโโ metrics/ # Distance metrics & evaluation
โ โโโ distances.rs
โ โโโ silhouette.rs
โ โโโ davies_bouldin.rs
โโโ scientific/ # Domain-specific clustering
โโโ astronomy.rs # Galaxy clustering
โโโ physics.rs # Particle clustering
โโโ spacetime.rs # 4D tensor clustering
```
## ๐ฏ Use Cases
### **Customer Segmentation**
```rust
let customer_features = extract_features(&customers)?;
let segments = KMeansBuilder::new(5).fit(customer_features.view())?;
```
### **Anomaly Detection**
```rust
let dbscan = DBSCANBuilder::new().eps(0.3).min_samples(5).fit(data.view())?;
let anomalies: Vec<_> = dbscan.labels.iter()
.enumerate()
.filter(|(_, &label)| label == -1)
.map(|(i, _)| i)
.collect();
```
### **Image Segmentation**
```rust
let pixels = image_to_array(&img)?;
let segments = MeanShiftBuilder::new().bandwidth(2.0).fit(pixels.view())?;
```
### **Document Clustering**
```rust
let docs = load_documents("corpus.txt")?;
let clusters = TextClusteringBuilder::new(10)
.max_features(1000)
.fit(&docs)?;
```
## ๐ Documentation
- **API Docs**: https://docs.rs/avx-clustering
- **Guide**: https://avila.inc/docs/clustering
- **Examples**: [`examples/`](examples/)
- **Benchmarks**: [`benches/`](benches/)
## ๐ฌ Comparison with Other Libraries
| Pure Rust | โ
| โ | โ | โ |
| GPU Support | โ
| โ | โ | โ
|
| HDBSCAN | โ
| โ | โ
| โ
|
| Time Series | โ
| โ ๏ธ | โ | โ |
| Scientific | โ
| โ | โ | โ |
| Memory | Low | High | Medium | High |
| Speed (CPU) | Fast | Slow | Fast | Slow |
| Speed (GPU) | Fastest | N/A | N/A | Fast |
## ๐ฃ๏ธ Roadmap
- [x] K-Means, DBSCAN, HDBSCAN, OPTICS
- [x] Affinity Propagation, Mean Shift, Spectral
- [x] Ensemble clustering
- [x] GPU acceleration (CUDA)
- [ ] More linkage methods for Agglomerative
- [ ] BIRCH algorithm
- [ ] CURE algorithm
- [ ] Fuzzy C-Means
- [ ] Subspace clustering
- [ ] Distributed clustering (multi-node)
## ๐ License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT license ([LICENSE-MIT](LICENSE-MIT))
at your option.
## ๐ค Contributing
Contributions welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md).
## ๐ง Contact
- **Website**: https://avila.inc
- **Email**: dev@avila.inc
- **GitHub**: https://github.com/avilaops/arxis
---
**Built with โค๏ธ in Brazil by avx Team**