rsomics-sc-combat 0.1.0

ComBat empirical-Bayes batch-effect correction of a single-cell matrix — matches scanpy pp.combat (parametric EB, ddof conventions)
Documentation

rsomics-sc-combat

ComBat empirical-Bayes batch-effect correction of a single-cell expression matrix, numerically matching scanpy's sc.pp.combat.

ComBat (Johnson, Li & Rabinovic 2007) removes batch effects by standardizing each gene across cells, fitting per-batch additive (γ) and multiplicative (δ) shifts, then shrinking those estimates toward gene-wise priors in an empirical-Bayes framework before de-standardizing. The shrinkage borrows strength across genes, which is what makes ComBat robust on small batches.

This crate implements the parametric EB adjustment with batch as the only model term (no extra covariates), as scanpy invokes it by default.

Usage

rsomics-sc-combat filtered_feature_bc_matrix/ -b batches.tsv -o corrected.mtx

# pick the label column by name when the TSV has a header
rsomics-sc-combat mtx_dir/ -b meta.tsv --key sample -o corrected.mtx

Input is a 10x MTX directory (matrix.mtx[.gz] + barcodes.tsv[.gz], genes × cells) and a barcode → batch-label TSV: column 1 is the barcode, column 2 the label (or the --key column when a header is present). At least two batches are required.

Output is a dense MatrixMarket array real general matrix in genes × cells layout, one value per line in column-major (cell-major) order — ComBat densifies the matrix because the additive correction shifts the implicit zeros.

Covariates beyond the batch variable (scanpy's covariates= / the sva mod design) are not implemented: the batch-only model is scanpy's default and the operation a pipeline reaches for; covariate-aware correction is a distinct, rarely-used mode.

Origin

This crate is an independent Rust reimplementation of scanpy's sc.pp.combat based on:

  • The published method (Johnson, Li & Rabinovic, "Adjusting batch effects in microarray expression data using empirical Bayes methods", Biostatistics 2007, doi:10.1093/biostatistics/kxj037).
  • The public MatrixMarket and 10x Genomics matrix file-format specs.
  • Reading scanpy's preprocessing/_combat.py (BSD-3-Clause) to match the exact numeric conventions: the population (ddof=0) pooled variance, the ddof=1 per-batch δ̂ variances, the ddof=0 prior-variance t2, the parametric a/b priors, and the iterative _it_sol posterior with conv = 1e-4.
  • Black-box value-level testing against the scanpy Python package.

License: MIT OR Apache-2.0. Upstream credit: scanpy https://github.com/scverse/scanpy (BSD-3-Clause), which itself follows the combat.py port by Pedersen.