clump
Clustering algorithms for dense f32 vectors in Rust. 9 algorithms, SIMD-accelerated, with optional GPU and parallel support.
Algorithms
| Algorithm | Kind | Discovers k | Noise handling | Input |
|---|---|---|---|---|
| K-means | Centroid | No (k required) | None | &impl DataRef |
| Mini-Batch K-means | Centroid (streaming) | No (k required) | None | &impl DataRef |
| DBSCAN | Density | Yes | Labels noise (NOISE sentinel) |
&impl DataRef |
| HDBSCAN | Density (hierarchical) | Yes | Labels noise | &impl DataRef |
| DenStream | Density (streaming) | Yes | Decays outliers | &impl DataRef |
| EVoC | Hierarchical | Yes | Near-duplicate detection | &impl DataRef |
| COP-Kmeans | Constrained centroid | No (k required) | None | &impl DataRef + constraints |
| OPTICS | Density (reachability) | Yes | Reachability plot | &impl DataRef |
| Correlation Clustering | Graph-based | Yes | None | SignedEdge list |
Quickstart
[]
= "0.5.2"
use ;
let data = vec!;
// K-means: returns labels (default: squared Euclidean)
let labels = new.with_seed.fit_predict.unwrap;
assert_eq!;
assert_ne!;
// DBSCAN: discovers clusters from density (default: Euclidean)
let labels = new.fit_predict.unwrap;
Kmeans::fit returns KmeansFit with centroids, which supports predict on new points. Dbscan::fit_predict assigns noise points to clump::NOISE; use fit_predict_with_noise for Option labels.
Zero-copy flat input
All algorithms accept &impl DataRef. Pass Vec<Vec<f32>> or use FlatRef for zero-copy flat buffers:
use ;
let flat = vec!;
let data = new;
let labels = new.with_seed.fit_predict.unwrap;
Streaming clustering
use MiniBatchKmeans;
let mut mbk = new.with_seed;
mbk.update_batch.unwrap;
mbk.update_batch.unwrap;
// Centroids available via mbk.centroids()
Constrained clustering
use ;
let constraints = vec!;
let labels = new
.with_seed
.fit_predict_constrained
.unwrap;
Correlation clustering
use ;
let edges = vec!;
let result = new.fit.unwrap;
let labels = result.labels;
Also see edges_from_distances to build signed edges from a distance matrix.
Distance metrics
All algorithms are generic over DistanceMetric. Built-in metrics:
| Metric | Formula |
|---|---|
SquaredEuclidean |
sum((a_i - b_i)^2) |
Euclidean |
sqrt(sum((a_i - b_i)^2)) |
CosineDistance |
1 - cos_sim(a, b) |
InnerProductDistance |
-dot(a, b) |
CompositeDistance |
Weighted sum of metrics |
Use with_metric on any algorithm to swap the metric:
use ;
let labels = with_metric
.with_seed
.fit_predict
.unwrap;
Custom metrics: implement DistanceMetric (one method: fn distance(&self, a: &[f32], b: &[f32]) -> f32).
Features
| Feature | Default | Effect |
|---|---|---|
parallel |
off | Enables Rayon parallelism for k-means and batch operations |
gpu |
off | Metal GPU acceleration for k-means assignment (macOS only) |
serde |
off | Serialize/deserialize for KmeansFit, SignedEdge, Constraint, etc. |
ndarray |
off | Conversion helpers between Array2<f32> and clump input format |
simd |
off | SIMD-accelerated distance via innr (NEON/AVX2/AVX-512) |
License
MIT OR Apache-2.0