# mcpcounter-rust
[](https://crates.io/crates/mcpcounter-rust)
[](https://docs.rs/mcpcounter-rust)
[](https://github.com/sinmojito/MCPcounter-rust/actions/workflows/ci.yml)
[](https://github.com/sinmojito/MCPcounter-rust/blob/main/LICENSE)
**mcpcounter-rust** is a pure-Rust, **zero-dependency** port of the **MCPcounter**
family of cell-population quantification methods — **MCP-counter** for human
tissue (Becht et al. 2016) and **mMCP-counter** for mouse (Petitprez et al.
2020). It reproduces the original R packages **bit-for-bit on synthetic inputs**
and to **the last bit of an IEEE-754 double** (worst relative error **2.2e-16**,
~1 ULP) on real public RNA-seq, with **no BLAS, LAPACK, Python, or R at
runtime** — and no third-party crates at all.
> **Disclaimer.** mcpcounter-rust is an independent, unofficial Rust port and is
> **not** affiliated with, endorsed by, or supported by the authors of
> MCP-counter or mMCP-counter. For the canonical, authoritative implementations,
> use the upstream R packages [ebecht/MCPcounter](https://github.com/ebecht/MCPcounter)
> and [cit-bioinfo/mMCP-counter](https://github.com/cit-bioinfo/mMCP-counter).
## Highlights
- **Numeric parity with R, validated at two levels.** Seeded synthetic fixtures
match the R packages to a **relative error of `0` (bit-identical)**; real
public RNA-seq matches to **≤ 1 ULP** (2.18e-16 human, 1.69e-16 mouse).
Nothing is accepted unless it reproduces R 4.6.0.
- **Both family methods, one engine.** Human **MCP-counter** (10 immune &
stromal populations, arithmetic-mean scores) and mouse **mMCP-counter** (16
populations, median scores) share a single R-faithful mean/median core.
- **Faithful to R's quirks, on purpose.** Reproduces R's order-sensitive
two-pass mean, even-`n` median (`mean(c(a, b))`, *not* `(a + b) / 2`),
MCP-counter's `intersect` de-duplication, and mMCP-counter's marker
*multiplicity* (a marker listed *k* times counts *k* times) — the details that
make last-bit parity hold. See [Fidelity notes](#fidelity-notes).
- **Zero dependencies.** No native libraries, no build script, no transitive
crates — a single, audit-tiny library. Marker signatures are embedded at
compile time.
- **Library-first.** `use mcpcounter_rust::...`, plus a dependency-free TSV
reader for matrix IO.
## Installation
```bash
cargo add mcpcounter-rust
```
The crate is `mcpcounter-rust`; the library imports as `mcpcounter_rust` (Cargo
maps the dash to an underscore). It builds on Rust **1.74+** and pulls in **no
dependencies**.
## Quick start
`mcp_counter` on a tiny human matrix — the canonical call, exactly as
`MCPcounter.estimate` in R. Markers are matched by name against the bundled
signature; genes that aren't markers are ignored, and populations with no
present markers are dropped:
```rust
use mcpcounter_rust::{mcp_counter, ExprMatrix, FeaturesType};
fn main() {
// 6 genes (rows) x 3 samples (cols), log2-scale expression, stored row-major.
// Five are MCP-counter markers; GAPDH is a non-marker housekeeping gene and
// is ignored, exactly as in R.
let genes = vec![
"CD3D".to_string(), // T cells
"CD28".to_string(), // T cells
"CD8B".to_string(), // CD8 T cells
"CD19".to_string(), // B lineage
"CD79A".to_string(), // B lineage
"GAPDH".to_string(), // not a marker -> ignored
];
let samples = vec![
"tumor_A".to_string(),
"tumor_B".to_string(),
"normal".to_string(),
];
#[rustfmt::skip]
let data = vec![
// tumor_A tumor_B normal
8.1, 7.4, 2.0, // CD3D
7.6, 7.1, 1.8, // CD28
6.9, 6.2, 1.5, // CD8B
4.2, 8.3, 2.1, // CD19
4.0, 8.0, 1.9, // CD79A
11.0, 11.1, 11.0, // GAPDH (flat housekeeping)
];
let expr = ExprMatrix::new(genes, samples, data);
let result = mcp_counter(&expr, FeaturesType::HugoSymbols);
// Each score is the mean of a population's present markers; populations
// appear in MCP-counter's signature order.
for (p, pop) in result.populations.iter().enumerate() {
print!("{pop:<14}");
for s in 0..result.samples.len() {
print!("{:>10.3}", result.score(p, s));
}
println!();
}
}
```
```text
T cells 7.850 7.250 1.900
CD8 T cells 6.900 6.200 1.500
B lineage 4.100 8.150 2.000
```
The mouse estimator has the same shape —
`mmcp_counter(&expr, MurineFeaturesType::GeneSymbol, GenomeVersion::GCRm39)` —
returning median scores over 16 murine populations. The runnable version of the
above is [`examples/quickstart.rs`](https://github.com/sinmojito/MCPcounter-rust/blob/main/examples/quickstart.rs);
the real-data parity harness is
[`examples/parity_realdata.rs`](https://github.com/sinmojito/MCPcounter-rust/blob/main/examples/parity_realdata.rs).
## Methods, inputs & licensing
Each method re-implements a published algorithm and inherits its upstream
license; the crate is therefore distributed under **GPL-3.0-or-later**.
| MCP-counter (human) | `mcp_counter` | 10 | mean | `HugoSymbols`, `Affy133P2Probesets`, `EntrezId`, `EnsemblId` | [ebecht/MCPcounter](https://github.com/ebecht/MCPcounter) — code GPL-3; signatures CC0 / public domain |
| mMCP-counter (mouse) | `mmcp_counter` | 16 | median | `GeneSymbol`, `EnsemblId`, `Probes` (× `GCRm38` / `GCRm39`) | [cit-bioinfo/mMCP-counter](https://github.com/cit-bioinfo/mMCP-counter) — code + signatures GPL-3 |
Input is an `ExprMatrix` (genes × samples; build with `ExprMatrix::new`, or parse
a TSV via `mcpcounter_rust::io::read_tsv_matrix`). No internal transform or
normalization is applied, matching R; `NA` maps to `f64::NAN` and is dropped per
population (`na.rm = TRUE`).
> **Scope.** This crate is the **MCPcounter family only**. Other deconvolution
> methods (ssGSEA-based, solver-based, …) are intentionally *not* bundled here —
> they belong in their own focused crates.
## Validation
Every method is checked against the unmodified upstream R package (R 4.6.0),
never against hand- or AI-computed values. Parity is verified at two levels, and
**both are required** — the real-data level catches what synthetic data hides:
| MCP-counter | seeded synthetic | — | committed fixtures | **0** (bit-identical) |
| mMCP-counter | seeded synthetic | — | committed fixtures | **0** (bit-identical) |
| MCP-counter | `dataset_racle` (Racle 2017, eLife) | human | 32,467 × 4 | **2.18e-16** |
| mMCP-counter | `dataset_petitprez` (Petitprez 2020, Genome Med) | mouse | 53,819 × 6 | **1.69e-16** |
Synthetic inputs report a perfect `0`; only real expression — with its hundreds
of present markers per population — surfaces the sub-ULP rounding, because R
accumulates means and medians in 80-bit `long double` that an `f64` cannot match
to the final bit. A flawless synthetic score would have *hidden* this, so
real-data validation is treated as a required gate, not a bonus.
Reproduce it from public data only (the large input matrices are never
committed):
```bash
Rscript benchmarks/parity_realdata.R # dump R reference outputs to scratch/
cargo run --example parity_realdata # revalidate; writes the metrics JSON
```
Synthetic fixtures run as plain unit tests (`cargo test`); aggregate real-data
metrics live in
[`benchmarks/parity_realdata_results.json`](https://github.com/sinmojito/MCPcounter-rust/blob/main/benchmarks/parity_realdata_results.json).
## Fidelity notes
The behaviours below are reproduced deliberately — they are why the numbers land
on the last bit rather than merely "close":
- **Order-sensitive two-pass mean.** R's `mean` accumulates in 80-bit long
double over two passes; marker rows are summed in ascending matrix-row order
(R's `intersect(rownames, markers)` order) so the rounding matches.
- **Even-`n` median = `mean(c(a, b))`.** R averages the two central order
statistics with that same two-pass mean, not the naive `(a + b) / 2` — they
can differ in the last ULP.
- **MCP-counter de-duplicates; mMCP-counter does not.** MCP-counter selects rows
via `intersect` (one row per marker, first occurrence); mMCP-counter selects
via `exp[sig[, features], ]`, so a marker listed *k* times contributes *k*
copies to the median.
- **Fixed 16-population mMCP output order**, with all-`NA`/absent populations
dropped but all-zero populations kept.
- **Empty-result divergence (documented).** When no markers match, R `stop()`s;
this port returns an empty `DeconvResult` instead.
## R → mcpcounter-rust name map
| `MCPcounter.estimate(expr, featuresType = …)` | `mcp_counter(&expr, FeaturesType::…)` |
| `mMCPcounter.estimate(exp, features = …, genomeVersion = …)` | `mmcp_counter(&expr, MurineFeaturesType::…, GenomeVersion::…)` |
| `"HUGO_symbols"` / `"affy133P2_probesets"` | `FeaturesType::HugoSymbols` / `Affy133P2Probesets` |
| result matrix `[population, sample]` | `DeconvResult::score(pop, sample)` |
## Performance
The crate is dependency-free and resolves each population's markers by direct
row lookup, so scoring is **O(present markers)** per population rather than a
per-population rescan of the whole gene panel — microsecond-scale per call on a
32k-gene human matrix (compute only). Note the shipped benchmark validates
**numeric parity, not wall-clock speed**: there is no head-to-head timing
harness against R, so no speedup multiple is claimed here.
## Building from source
```bash
cargo build --release
cargo test # unit + synthetic-parity tests
cargo run --example quickstart
```
The crate has a single build configuration (no feature flags, no dependencies),
so `cargo build` and `cargo build --all-features` are identical. The local CI
gate is `cargo fmt --all -- --check`, then
`cargo clippy --all-targets --all-features -- -D warnings`, then `cargo test`.
## How this port was produced
mcpcounter-rust was written with AI assistance (Anthropic's Claude, via Claude
Code); the `Co-Authored-By` trailers in the git history record this. Its outputs
are checked numerically against the upstream R packages, as described under
[Validation](#validation) — never against AI-generated expected values. Upstream
sources and signature tables are fetched/exported locally for study; nothing is
vendored beyond the embedded marker signatures.
## License
MCP-counter and mMCP-counter are licensed GPL-3. As a derivative port,
mcpcounter-rust is distributed under **GPL-3.0-or-later**; the full text is in
[`LICENSE`](https://github.com/sinmojito/MCPcounter-rust/blob/main/LICENSE). The
bundled MCP-counter marker signatures are upstream CC0 / public-domain data; the
mMCP-counter signatures are GPL-3 (exported from the package's data tables).
## Credits & citation
MCP-counter and mMCP-counter are the work of Etienne Becht, Florent Petitprez,
Aurélien de Reyniès, and colleagues. This port reimplements their algorithms; it
does not originate them. If you use mcpcounter-rust in research, please cite the
original papers:
> Becht, E., Giraldo, N.A., Lacroix, L., et al. (2016). Estimating the
> population abundance of tissue-infiltrating immune and stromal cell
> populations using gene expression. *Genome Biology* 17, 218.
> doi:[10.1186/s13059-016-1070-5](https://doi.org/10.1186/s13059-016-1070-5)
> Petitprez, F., Levy, S., Sun, C.-M., et al. (2020). The murine Microenvironment
> Cell Population counter method to estimate abundance of tissue-infiltrating
> immune and stromal cell populations in murine samples using gene expression.
> *Genome Medicine* 12, 86.
> doi:[10.1186/s13073-020-00783-w](https://doi.org/10.1186/s13073-020-00783-w)