rsomics-sc-cell-cycle 0.1.0

Per-cell cell-cycle phase scoring (S_score, G2M_score, phase) from a single-cell matrix — matches scanpy tl.score_genes_cell_cycle(random_state=0) value-exactly
Documentation

rsomics-sc-cell-cycle

Per-cell cell-cycle phase scoring for single-cell expression. Given an S-phase and a G2M-phase gene list, it computes an S_score and a G2M_score per cell (each a score_genes-style set mean minus a bin-matched control mean) and assigns each cell a phase — S, G2M, or G1. This is scanpy's sc.tl.score_genes_cell_cycle.

rsomics-sc-cell-cycle <10x-mtx-dir> --s-genes s.txt --g2m-genes g2m.txt \
    -o phase.tsv [--n-bins 25 --seed 0]

Input is a 10x MatrixMarket directory (matrix.mtx[.gz], features.tsv[.gz], barcodes.tsv[.gz]) plus two gene-list files (one gene id — features.tsv column 1 — per line). Output is a cell_id<TAB>S_score<TAB>G2M_score<TAB>phase TSV, one line per cell.

The two scores are computed with ctrl_size = min(len(s_genes), len(g2m_genes)), exactly as scanpy fixes it. The phase call is then: G1 if both scores are negative, else G2M if the G2M score exceeds the S score, else S. With --seed 0 the control draw — and therefore every score and phase — is value-exact against scanpy.tl.score_genes_cell_cycle(adata, random_state=0).

Origin

Independent Rust reimplementation of scanpy's sc.tl.score_genes_cell_cycle, based on the scanpy source (scanpy/tools/_score_genes.py, BSD-3-Clause — readable, cited) and the Seurat cell-cycle scoring method:

  • Satija et al., "Spatial reconstruction of single-cell gene expression data", Nature Biotechnology 33, 495–502 (2015), DOI 10.1038/nbt.3192.

The scoring core (bin-matched control sampling reproducing numpy's RandomState MT19937 + Fisher-Yates choice(replace=False) bit-for-bit) is shared with rsomics-sc-score-genes; this crate adds the deterministic two-score phase call.

License: MIT OR Apache-2.0. Upstream credit: scanpy (https://github.com/scverse/scanpy, BSD-3-Clause).