# rsomics-sc-combat
ComBat empirical-Bayes batch-effect correction of a single-cell expression
matrix, numerically matching scanpy's `sc.pp.combat`.
ComBat (Johnson, Li & Rabinovic 2007) removes batch effects by standardizing
each gene across cells, fitting per-batch additive (`γ`) and multiplicative
(`δ`) shifts, then shrinking those estimates toward gene-wise priors in an
empirical-Bayes framework before de-standardizing. The shrinkage borrows
strength across genes, which is what makes ComBat robust on small batches.
This crate implements the **parametric** EB adjustment with batch as the only
model term (no extra covariates), as scanpy invokes it by default.
## Usage
```bash
rsomics-sc-combat filtered_feature_bc_matrix/ -b batches.tsv -o corrected.mtx
# pick the label column by name when the TSV has a header
rsomics-sc-combat mtx_dir/ -b meta.tsv --key sample -o corrected.mtx
```
Input is a 10x MTX directory (`matrix.mtx[.gz]` + `barcodes.tsv[.gz]`,
genes × cells) and a barcode → batch-label TSV: column 1 is the barcode,
column 2 the label (or the `--key` column when a header is present). At least
two batches are required.
Output is a dense MatrixMarket `array real general` matrix in genes × cells
layout, one value per line in column-major (cell-major) order — ComBat
densifies the matrix because the additive correction shifts the implicit
zeros.
Covariates beyond the batch variable (scanpy's `covariates=` / the sva `mod`
design) are not implemented: the batch-only model is scanpy's default and the
operation a pipeline reaches for; covariate-aware correction is a distinct,
rarely-used mode.
## Origin
This crate is an independent Rust reimplementation of scanpy's `sc.pp.combat`
based on:
- The published method (Johnson, Li & Rabinovic, "Adjusting batch effects in
microarray expression data using empirical Bayes methods", *Biostatistics*
2007, doi:10.1093/biostatistics/kxj037).
- The public MatrixMarket and 10x Genomics matrix file-format specs.
- Reading scanpy's `preprocessing/_combat.py` (BSD-3-Clause) to match the exact
numeric conventions: the population (ddof=0) pooled variance, the ddof=1
per-batch δ̂ variances, the ddof=0 prior-variance `t2`, the parametric
`a`/`b` priors, and the iterative `_it_sol` posterior with `conv = 1e-4`.
- Black-box value-level testing against the scanpy Python package.
License: MIT OR Apache-2.0.
Upstream credit: scanpy <https://github.com/scverse/scanpy> (BSD-3-Clause),
which itself follows the `combat.py` port by Pedersen.