rsomics-sc-combat
ComBat empirical-Bayes batch-effect correction of a single-cell expression
matrix, numerically matching scanpy's sc.pp.combat.
ComBat (Johnson, Li & Rabinovic 2007) removes batch effects by standardizing
each gene across cells, fitting per-batch additive (γ) and multiplicative
(δ) shifts, then shrinking those estimates toward gene-wise priors in an
empirical-Bayes framework before de-standardizing. The shrinkage borrows
strength across genes, which is what makes ComBat robust on small batches.
This crate implements the parametric EB adjustment with batch as the only model term (no extra covariates), as scanpy invokes it by default.
Usage
# pick the label column by name when the TSV has a header
Input is a 10x MTX directory (matrix.mtx[.gz] + barcodes.tsv[.gz],
genes × cells) and a barcode → batch-label TSV: column 1 is the barcode,
column 2 the label (or the --key column when a header is present). At least
two batches are required.
Output is a dense MatrixMarket array real general matrix in genes × cells
layout, one value per line in column-major (cell-major) order — ComBat
densifies the matrix because the additive correction shifts the implicit
zeros.
Covariates beyond the batch variable (scanpy's covariates= / the sva mod
design) are not implemented: the batch-only model is scanpy's default and the
operation a pipeline reaches for; covariate-aware correction is a distinct,
rarely-used mode.
Origin
This crate is an independent Rust reimplementation of scanpy's sc.pp.combat
based on:
- The published method (Johnson, Li & Rabinovic, "Adjusting batch effects in microarray expression data using empirical Bayes methods", Biostatistics 2007, doi:10.1093/biostatistics/kxj037).
- The public MatrixMarket and 10x Genomics matrix file-format specs.
- Reading scanpy's
preprocessing/_combat.py(BSD-3-Clause) to match the exact numeric conventions: the population (ddof=0) pooled variance, the ddof=1 per-batch δ̂ variances, the ddof=0 prior-variancet2, the parametrica/bpriors, and the iterative_it_solposterior withconv = 1e-4. - Black-box value-level testing against the scanpy Python package.
License: MIT OR Apache-2.0.
Upstream credit: scanpy https://github.com/scverse/scanpy (BSD-3-Clause),
which itself follows the combat.py port by Pedersen.