xcell-rust 0.1.0

Pure-Rust port of xCell (Aran et al. 2017) cell-type enrichment — ssGSEA, spillover-corrected — validated for numeric parity against the R xCell package. Built on gsva-rust.
Documentation
# Changelog

All notable changes to this project are documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.1.0] - 2026-06-01

A pure-Rust port of xCell 1.1.0 (Aran, Hu & Butte 2017), built on the `gsva`
ssGSEA engine and validated for numeric parity against the original R package
(R 4.6.0, xCell 1.1.0, GSVA 2.6.2). The default build depends on `gsva-rust`
(taken faer-free, as its ssGSEA core) and `rayon` for parity-preserving
parallelism; `--no-default-features` drops rayon for a serial crate whose only
dependency is gsva-rust's zero-dependency core. Either way, no BLAS/LAPACK,
`quadprog`/`pracma`, Python, or R is required at runtime — even the spillover
non-negative least squares is hand-rolled.

### Added

- Embedded xCell 1.1.0 model data (`XCellModel::load`): the 489 signatures, the
  10 808-gene scoring universe, the 64-cell-type canonical order, and the
  spillover matrix `K` + calibration table `fv` for both the RNA-seq and
  microarray spill objects, exported from the GPL-3 xCell package by
  `benchmarks/export_xcell_data.R`.
- `raw_enrichment` — xCell `rawEnrichmentAnalysis`: intersect with the scoring
  universe (≥ 5000 genes required, else a hard error), per-sample average-tie
  rank, unnormalized ssGSEA (`gsva::ssgsea`, `normalize = false`) over the 489
  signatures, per-signature minimum subtraction, and per-cell-type averaging in
  xCell's canonical order. Matches R to ~1 ULP on real data (worst abs ≈ 1.8e-12
  over the GBM cohort).
- `transform_scores` — xCell `transformScores`: per-cell-type power-curve
  calibration with the `scale` toggle. Matches R to ~1 ULP (worst abs ≈ 1.1e-16).
- `spill_over` — xCell `spillOver`: per-sample non-negative least squares against
  the `αK`-with-unit-diagonal compensation matrix, via a hand-rolled
  Lawson–Hanson active-set NNLS solver (normal-equations passive-set solves with
  partial-pivoting Gaussian elimination). The QP is strictly convex (unique
  optimum), so this matches R's `pracma::lsqlincon` to ~1e-11 relative
  (worst abs ≈ 2.4e-15) without any linear-algebra dependency.
- `microenvironment_scores` — xCell `microenvironmentScores`: appends
  `ImmuneScore` (Σ immune / 1.5), `StromaScore` (Σ stroma / 2), and
  `MicroenvironmentScore`.
- `xcell_analysis` — the end-to-end `xCellAnalysis` entry point with
  `XCellParams { rnaseq, scale, alpha }` (defaults `true` / `true` / `0.5`).
  Whole-pipeline parity vs R is ~1.5e-11 relative (worst abs ≈ 2.9e-15) over the
  67 × 173 GBM output.
- Validation: a real-data parity gate (`examples/parity_realdata.rs`, against the
  public `gbm_VerhaakEtAl` cohort) and committed offline synthetic fixtures
  (`tests/parity_synthetic.rs`) covering stages 2–4 with no R installation.
- `parallel` Cargo feature (on by default): parallelizes the per-sample spillover
  NNLS over rayon and forwards `gsva-rust/parallel` so the ssGSEA enrichment stage
  parallelizes too. Both axes only reorder independent per-sample work, so results
  are bit-identical to the serial build (and to R). Build `--no-default-features`
  for a serial crate whose sole dependency is gsva-rust's zero-dependency core;
  xcell takes gsva with default features off, so the `faer` SVD backend is never
  pulled (xcell uses only gsva's ssGSEA path, never plage).
- Timing + peak-memory benchmarks: a cross-process harness (`examples/xcell_bench.rs`
  driven by `benchmarks/bench_xcell.R` on the real GBM cohort and
  `benchmarks/bench_scaling_xcell.R` on a seeded synthetic generator), with full
  methodology, scaling sweeps, and memory curves in
  [`benchmarks/REPORT.md`]benchmarks/REPORT.md. Against single-threaded R xCell
  1.1.0 the default build runs the end-to-end `xCellAnalysis` **~23× faster** and
  the isolated per-sample spillover NNLS **~7.9× faster** (~7× from rayon alone),
  with the serial and parallel builds asserted bit-identical at every benchmarked
  size.

### Notes

- MSRV is **1.85**, inherited from the `gsva-rust` dependency. gsva-rust declares
  `rust-version = 1.85` (its default faer build needs the edition2024 feature,
  stabilized in Rust 1.85), and cargo enforces a dependency's `rust-version` on
  every build that compiles it — so xcell-rust cannot build on an older toolchain
  even though its own code, and its lean faer-free view of gsva, otherwise could.
  CI builds both feature configurations (default and `--no-default-features`).

[Unreleased]: https://github.com/sinmojito/xcell-rust/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/sinmojito/xcell-rust/releases/tag/v0.1.0