mcpcounter-rust
mcpcounter-rust is a pure-Rust, zero-dependency port of the MCPcounter family of cell-population quantification methods — MCP-counter for human tissue (Becht et al. 2016) and mMCP-counter for mouse (Petitprez et al. 2020). It reproduces the original R packages bit-for-bit on synthetic inputs and to the last bit of an IEEE-754 double (worst relative error 2.2e-16, ~1 ULP) on real public RNA-seq, with no BLAS, LAPACK, Python, or R at runtime — and no third-party crates at all.
Disclaimer. mcpcounter-rust is an independent, unofficial Rust port and is not affiliated with, endorsed by, or supported by the authors of MCP-counter or mMCP-counter. For the canonical, authoritative implementations, use the upstream R packages ebecht/MCPcounter and cit-bioinfo/mMCP-counter.
Highlights
- Numeric parity with R, validated at two levels. Seeded synthetic fixtures
match the R packages to a relative error of
0(bit-identical); real public RNA-seq matches to ≤ 1 ULP (2.18e-16 human, 1.69e-16 mouse). Nothing is accepted unless it reproduces R 4.6.0. - Both family methods, one engine. Human MCP-counter (10 immune & stromal populations, arithmetic-mean scores) and mouse mMCP-counter (16 populations, median scores) share a single R-faithful mean/median core.
- Faithful to R's quirks, on purpose. Reproduces R's order-sensitive
two-pass mean, even-
nmedian (mean(c(a, b)), not(a + b) / 2), MCP-counter'sintersectde-duplication, and mMCP-counter's marker multiplicity (a marker listed k times counts k times) — the details that make last-bit parity hold. See Fidelity notes. - Zero dependencies. No native libraries, no build script, no transitive crates — a single, audit-tiny library. Marker signatures are embedded at compile time.
- Library-first.
use mcpcounter_rust::..., plus a dependency-free TSV reader for matrix IO.
Installation
The crate is mcpcounter-rust; the library imports as mcpcounter_rust (Cargo
maps the dash to an underscore). It builds on Rust 1.74+ and pulls in no
dependencies.
Quick start
mcp_counter on a tiny human matrix — the canonical call, exactly as
MCPcounter.estimate in R. Markers are matched by name against the bundled
signature; genes that aren't markers are ignored, and populations with no
present markers are dropped:
use ;
T cells 7.850 7.250 1.900
CD8 T cells 6.900 6.200 1.500
B lineage 4.100 8.150 2.000
The mouse estimator has the same shape —
mmcp_counter(&expr, MurineFeaturesType::GeneSymbol, GenomeVersion::GCRm39) —
returning median scores over 16 murine populations. The runnable version of the
above is examples/quickstart.rs;
the real-data parity harness is
examples/parity_realdata.rs.
Methods, inputs & licensing
Each method re-implements a published algorithm and inherits its upstream license; the crate is therefore distributed under GPL-3.0-or-later.
| Method | Function | Populations | Score | Feature namespaces | Upstream / license |
|---|---|---|---|---|---|
| MCP-counter (human) | mcp_counter |
10 | mean | HugoSymbols, Affy133P2Probesets, EntrezId, EnsemblId |
ebecht/MCPcounter — code GPL-3; signatures CC0 / public domain |
| mMCP-counter (mouse) | mmcp_counter |
16 | median | GeneSymbol, EnsemblId, Probes (× GCRm38 / GCRm39) |
cit-bioinfo/mMCP-counter — code + signatures GPL-3 |
Input is an ExprMatrix (genes × samples; build with ExprMatrix::new, or parse
a TSV via mcpcounter_rust::io::read_tsv_matrix). No internal transform or
normalization is applied, matching R; NA maps to f64::NAN and is dropped per
population (na.rm = TRUE).
Scope. This crate is the MCPcounter family only. Other deconvolution methods (ssGSEA-based, solver-based, …) are intentionally not bundled here — they belong in their own focused crates.
Validation
Every method is checked against the unmodified upstream R package (R 4.6.0), never against hand- or AI-computed values. Parity is verified at two levels, and both are required — the real-data level catches what synthetic data hides:
| Method | Dataset | Organism | Shape (genes × samples) | Worst relative error |
|---|---|---|---|---|
| MCP-counter | seeded synthetic | — | committed fixtures | 0 (bit-identical) |
| mMCP-counter | seeded synthetic | — | committed fixtures | 0 (bit-identical) |
| MCP-counter | dataset_racle (Racle 2017, eLife) |
human | 32,467 × 4 | 2.18e-16 |
| mMCP-counter | dataset_petitprez (Petitprez 2020, Genome Med) |
mouse | 53,819 × 6 | 1.69e-16 |
Synthetic inputs report a perfect 0; only real expression — with its hundreds
of present markers per population — surfaces the sub-ULP rounding, because R
accumulates means and medians in 80-bit long double that an f64 cannot match
to the final bit. A flawless synthetic score would have hidden this, so
real-data validation is treated as a required gate, not a bonus.
Reproduce it from public data only (the large input matrices are never committed):
Synthetic fixtures run as plain unit tests (cargo test); aggregate real-data
metrics live in
benchmarks/parity_realdata_results.json.
Fidelity notes
The behaviours below are reproduced deliberately — they are why the numbers land on the last bit rather than merely "close":
- Order-sensitive two-pass mean. R's
meanaccumulates in 80-bit long double over two passes; marker rows are summed in ascending matrix-row order (R'sintersect(rownames, markers)order) so the rounding matches. - Even-
nmedian =mean(c(a, b)). R averages the two central order statistics with that same two-pass mean, not the naive(a + b) / 2— they can differ in the last ULP. - MCP-counter de-duplicates; mMCP-counter does not. MCP-counter selects rows
via
intersect(one row per marker, first occurrence); mMCP-counter selects viaexp[sig[, features], ], so a marker listed k times contributes k copies to the median. - Fixed 16-population mMCP output order, with all-
NA/absent populations dropped but all-zero populations kept. - Empty-result divergence (documented). When no markers match, R
stop()s; this port returns an emptyDeconvResultinstead.
R → mcpcounter-rust name map
| R | mcpcounter-rust |
|---|---|
MCPcounter.estimate(expr, featuresType = …) |
mcp_counter(&expr, FeaturesType::…) |
mMCPcounter.estimate(exp, features = …, genomeVersion = …) |
mmcp_counter(&expr, MurineFeaturesType::…, GenomeVersion::…) |
"HUGO_symbols" / "affy133P2_probesets" |
FeaturesType::HugoSymbols / Affy133P2Probesets |
result matrix [population, sample] |
DeconvResult::score(pop, sample) |
Performance
The crate is dependency-free and resolves each population's markers by direct row lookup, so scoring is O(present markers) per population rather than a per-population rescan of the whole gene panel — microsecond-scale per call on a 32k-gene human matrix (compute only). Note the shipped benchmark validates numeric parity, not wall-clock speed: there is no head-to-head timing harness against R, so no speedup multiple is claimed here.
Building from source
The crate has a single build configuration (no feature flags, no dependencies),
so cargo build and cargo build --all-features are identical. The local CI
gate is cargo fmt --all -- --check, then
cargo clippy --all-targets --all-features -- -D warnings, then cargo test.
How this port was produced
mcpcounter-rust was written with AI assistance (Anthropic's Claude, via Claude
Code); the Co-Authored-By trailers in the git history record this. Its outputs
are checked numerically against the upstream R packages, as described under
Validation — never against AI-generated expected values. Upstream
sources and signature tables are fetched/exported locally for study; nothing is
vendored beyond the embedded marker signatures.
License
MCP-counter and mMCP-counter are licensed GPL-3. As a derivative port,
mcpcounter-rust is distributed under GPL-3.0-or-later; the full text is in
LICENSE. The
bundled MCP-counter marker signatures are upstream CC0 / public-domain data; the
mMCP-counter signatures are GPL-3 (exported from the package's data tables).
Credits & citation
MCP-counter and mMCP-counter are the work of Etienne Becht, Florent Petitprez, Aurélien de Reyniès, and colleagues. This port reimplements their algorithms; it does not originate them. If you use mcpcounter-rust in research, please cite the original papers:
Becht, E., Giraldo, N.A., Lacroix, L., et al. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biology 17, 218. doi:10.1186/s13059-016-1070-5
Petitprez, F., Levy, S., Sun, C.-M., et al. (2020). The murine Microenvironment Cell Population counter method to estimate abundance of tissue-infiltrating immune and stromal cell populations in murine samples using gene expression. Genome Medicine 12, 86. doi:10.1186/s13073-020-00783-w