flow-clustering 0.1.1

Clustering algorithms for flow cytometry: K-means, DBSCAN, GMM
Documentation

flow-clustering

Clustering algorithms for flow cytometry: K-means, DBSCAN, and Gaussian Mixture Models.

crates.io docs.rs MIT

Overview

flow-clustering provides unsupervised clustering algorithms commonly used in automated flow cytometry gating. Each algorithm wraps the linfa ecosystem with flow-cytometry-specific configuration defaults and result types.

Features

Feature Description
kmeans (default) K-means clustering
dbscan (default) Density-based spatial clustering (DBSCAN)
gmm (default) Gaussian Mixture Model fitting

Public API

K-Means

use flow_clustering::{KMeans, KMeansConfig};

let config = KMeansConfig { n_clusters: 3, max_iterations: 300, ..Default::default() };
let result = KMeans::fit(&data, &config)?;
// result.labels: Vec<usize>, result.centroids: Array2<f64>

DBSCAN

use flow_clustering::{Dbscan, DbscanConfig};

let config = DbscanConfig { eps: 0.5, min_samples: 5 };
let result = Dbscan::fit(&data, &config)?;
// result.labels: Vec<Option<usize>> (None = noise)

Gaussian Mixture Model

use flow_clustering::{Gmm, GmmConfig};

let config = GmmConfig { n_clusters: 2, max_iterations: 100, ..Default::default() };
let result = Gmm::fit(&data, &config)?;
// result.labels: Vec<usize>, result.means: Array2<f64>

Cluster Validation

use flow_clustering::{silhouette_scores, silhouette_scores_sampled};

let scores = silhouette_scores(&data, &labels)?;
// scores.mean_score: f64, scores.per_sample: Vec<f64>

// For large datasets, use sampling:
let scores = silhouette_scores_sampled(&data, &labels, 1000)?;

Algorithms

  • K-Means: Lloyd's algorithm via linfa-clustering. Supports both row-major Array2 input and convenience fit_from_rows for pre-separated channel vectors.
  • DBSCAN: Density-based clustering that identifies noise points. Useful for scatter gating where populations have irregular shapes.
  • GMM: Expectation-maximization for Gaussian mixtures. Models multi-modal populations common in fluorescence channels.
  • Silhouette scores: Cluster quality metric (−1 to +1). Full O(n²) and sampled O(n·k) variants.

Scope

This crate owns:

  • Unsupervised clustering algorithms for cytometry event data
  • Cluster quality/validation metrics
  • (Future) FlowSOM-style self-organizing maps
  • (Future) Hierarchical clustering / dendrograms
  • (Future) Cluster merging heuristics for automated gating

It does not own: gate geometry, density estimation (see flow-density), FCS parsing, or visualization.

Tests

cargo test -p flow-clustering

4 unit tests covering silhouette score correctness for well-separated and overlapping clusters.

License

MIT