flow-clustering
Clustering algorithms for flow cytometry: K-means, DBSCAN, and Gaussian Mixture Models.
Overview
flow-clustering provides unsupervised clustering algorithms commonly used in automated flow cytometry gating. Each algorithm wraps the linfa ecosystem with flow-cytometry-specific configuration defaults and result types.
Features
| Feature | Description |
|---|---|
kmeans (default) |
K-means clustering |
dbscan (default) |
Density-based spatial clustering (DBSCAN) |
gmm (default) |
Gaussian Mixture Model fitting |
Public API
K-Means
use ;
let config = KMeansConfig ;
let result = fit?;
// result.labels: Vec<usize>, result.centroids: Array2<f64>
DBSCAN
use ;
let config = DbscanConfig ;
let result = fit?;
// result.labels: Vec<Option<usize>> (None = noise)
Gaussian Mixture Model
use ;
let config = GmmConfig ;
let result = fit?;
// result.labels: Vec<usize>, result.means: Array2<f64>
Cluster Validation
use ;
let scores = silhouette_scores?;
// scores.mean_score: f64, scores.per_sample: Vec<f64>
// For large datasets, use sampling:
let scores = silhouette_scores_sampled?;
Algorithms
- K-Means: Lloyd's algorithm via
linfa-clustering. Supports both row-majorArray2input and conveniencefit_from_rowsfor pre-separated channel vectors. - DBSCAN: Density-based clustering that identifies noise points. Useful for scatter gating where populations have irregular shapes.
- GMM: Expectation-maximization for Gaussian mixtures. Models multi-modal populations common in fluorescence channels.
- Silhouette scores: Cluster quality metric (−1 to +1). Full O(n²) and sampled O(n·k) variants.
Scope
This crate owns:
- Unsupervised clustering algorithms for cytometry event data
- Cluster quality/validation metrics
- (Future) FlowSOM-style self-organizing maps
- (Future) Hierarchical clustering / dendrograms
- (Future) Cluster merging heuristics for automated gating
It does not own: gate geometry, density estimation (see flow-density), FCS parsing, or visualization.
Tests
4 unit tests covering silhouette score correctness for well-separated and overlapping clusters.
License
MIT