rsomics-sc-scale
Per-gene z-score scaling of a single-cell count matrix, numerically matching
scanpy's sc.pp.scale.
Each gene is centered and scaled across cells: z = (x − mean) / std, where
mean and std are computed over all cells (the implicit zeros of the
sparse matrix included) and the standard deviation uses the ddof=1 (sample)
convention scanpy enforces. A gene with zero variance keeps std = 1, leaving
its centered row at exactly zero.
Scaling densifies the matrix: subtracting a nonzero gene mean turns every
implicit zero into −mean/std, so the output is a full genes × cells dense
matrix written in MatrixMarket array (column-major) layout.
With --max-value, the z-scores are symmetrically clipped to
[−max-value, max-value] after scaling (scanpy's zero_center=True clip).
Usage
# scanpy default: zero-center, no clipping
# scale and clip to ±10 (a common scanpy idiom)
Input is a 10x MTX directory (matrix.mtx or matrix.mtx.gz, genes × cells).
Output is a dense MatrixMarket array real general matrix in genes × cells
layout, one value per line in column-major (cell-major) order.
zero_center=False (scanpy's optional mode that divides by std without
centering and keeps sparsity) is not implemented: the centered z-score is
the routine default, and the uncentered variant is a niche memory optimization
for matrices kept sparse downstream.
Origin
This crate is an independent Rust reimplementation of scanpy's sc.pp.scale
based on:
- The published method (Wolf, Angerer & Theis, "SCANPY: large-scale single-cell gene expression data analysis", Genome Biology 2018, doi:10.1186/s13059-017-1382-0).
- The public MatrixMarket and 10x Genomics matrix file-format specs.
- Reading scanpy's
_scale.py/_utils._get_mean_var(BSD-3-Clause) to match the exact std convention (ddof=1), the zero-variancestd=1rule, and the symmetric clip semantics. - Black-box value-level testing against the scanpy Python package.
License: MIT OR Apache-2.0. Upstream credit: scanpy https://github.com/scverse/scanpy (BSD-3-Clause).