limma-rust

limma-rust is a pure-Rust port of the Bioconductor limma package: linear models and empirical-Bayes moderation for differential expression in microarray, RNA-seq, proteomics, and metabolomics data. It reproduces limma 3.68.3's output to 8+ significant figures (checked function by function against R 4.6.0), makes identical differential-expression calls (zero discordant at FDR 5% across six public accessions and four organisms), and runs 2-15× faster at 4.8-11.8× lower peak memory, with no BLAS, LAPACK, Python, or R at runtime.

Disclaimer. limma-rust is an independent, unofficial Rust port and is not affiliated with, endorsed by, or supported by the authors of limma or the Walter and Eliza Hall Institute (WEHI). For the canonical, authoritative implementation, use the Bioconductor limma package.

Highlights

Function-by-function parity. 229 tests assert that each ported function reproduces Bioconductor limma 3.68.3 (on R 4.6.0) to 8+ significant figures; nothing is accepted unless it matches.
Validated on real data. End-to-end agreement on six public accessions across three omics modalities (RNA-seq, proteomics, metabolomics) and four organisms (human, mouse, fruit fly, fission yeast), with 100% decideTests concordance and zero discordant significant calls.
Bit-exact where it counts. The rotation gene-set tests (roast/mroast/romer) reproduce R's Mersenne-Twister draws exactly, so their Monte-Carlo p-values match to the last digit.
Fast, and parallel by default. Pure-Rust SIMD linear algebra (faer) plus across-gene rayon parallelism: 2.0× on microarrays, 7.5× on a 204-sample RNA-seq voom set, 15.4× on synthetic voom, and up to 243× on per-gene workloads like duplicateCorrelation.
No native dependencies. All linear algebra and special functions are pure Rust (distributions from statrs): a single static binary, nothing to link, no R or Python to install.
Library or CLI. Use it as a crate (use limma::...) or drive the bundled limma command-line tool over CSV. --no-default-features builds a minimal, dependency-light pure-ndarray library.

Installation

The crate is published as limma-rust; the library is imported as limma (so use limma::... mirrors R's limma::).

# Library

cargo add limma-rust


# CLI binary (installs a `limma` executable)

cargo install limma-rust

The default build requires Rust 1.85+. Building --no-default-features (no faer/rayon/clap) drops that requirement for older toolchains.

Quick start: library

lmFit -> eBayes -> topTable, the canonical limma pipeline, exactly as in R:

use limma::{ebayes, lmfit, top_table, SortBy};
use ndarray::Array2;

// Expression matrix: 6 genes (rows) × 6 samples (columns), on the log2 scale.
let exprs = Array2::from_shape_vec((6, 6), vec![
    5.1, 4.9, 5.0, 7.0, 7.2, 6.9, // g1: up in group B
    3.0, 3.1, 2.9, 3.0, 2.8, 3.1, // g2: flat
    8.0, 8.2, 7.9, 6.0, 5.9, 6.1, // g3: down in group B
    1.0, 1.2, 0.9, 1.1, 1.0, 1.3, // g4: flat
    4.0, 4.1, 3.9, 4.2, 4.0, 4.1, // g5: flat
    2.0, 2.1, 1.9, 5.0, 5.1, 4.9, // g6: up in group B
]).unwrap();

// Design: 6 samples (rows) × 2 coefficients: intercept + a group-B indicator.
let design = Array2::from_shape_vec((6, 2), vec![
    1.0, 0.0,  1.0, 0.0,  1.0, 0.0,
    1.0, 1.0,  1.0, 1.0,  1.0, 1.0,
]).unwrap();

let genes: Vec<String> = (1..=6).map(|i| format!("g{i}")).collect();
let coefs = vec!["Intercept".to_string(), "grpB".to_string()];

let mut fit = lmfit(&exprs, &design, genes, coefs).unwrap();
ebayes(&mut fit, 0.01, (0.1, 4.0), /* trend */ false, /* robust */ false).unwrap();

// Tabulate the "grpB" coefficient (index 1), ranked by the B-statistic.
for row in top_table(&fit, 1, SortBy::B, None).unwrap() {
    println!(
        "{:<3} log2FC={:+.2}  t={:+.2}  adj.P={:.1e}  B={:+.2}",
        row.id, row.log2_fold_change, row.t, row.adj_p_value, row.b,
    );
}
// g6 (the +3 log2FC gene) tops the table; g3 (down) and g1 (up) tie next;
// the three flat genes fall to the bottom with strongly negative B.

Quick start: CLI

# lmFit -> eBayes -> topTable for one coefficient

limma --exprs exprs.csv --design design.csv --coef grpB --out results.csv


# Overall moderated F-statistic across all coefficients

limma --exprs exprs.csv --design design.csv --f-test --out ftable.csv


# eBayes with an intensity-dependent variance trend + robust hyperparameters

limma --exprs exprs.csv --design design.csv --coef grpB --trend --robust


# TREAT: moderated t against an absolute log2 fold-change threshold of 1.0

limma --exprs exprs.csv --design design.csv --coef grpB --treat 1.0


# Apply a contrast matrix, then write decideTests calls (BH, p < 0.05)

limma --exprs exprs.csv --design design.csv \

      --contrasts contrasts.csv --coef AS_vs_AR --decide calls.csv

Input format

Both inputs are delimited text whose first column is an identifier and whose header names the remaining columns. The delimiter is detected from the file extension (tab for .tsv / .tab, comma otherwise) and can be forced with --delimiter (a single character, or one of comma, tab, semicolon, space):

# exprs.csv: genes × samples (first column = gene id, header = sample names)
gene,s1,s2,s3,s4,s5,s6
g1,5.1,4.9,5.0,7.0,7.2,6.9
g2,3.0,3.1,2.9,3.0,2.8,3.1

# design.csv: samples × coefficients (first column = sample id, header = coef names)
sample,Intercept,grpB
s1,1,0
s4,1,1

Common options

Option	Description	Default
`--exprs <FILE>`	Expression matrix CSV/TSV (genes × samples)	required
`--design <FILE>`	Design matrix CSV/TSV (samples × coefficients)	required
`--contrasts <FILE>`	Contrast matrix CSV/TSV (coefficients × contrasts)	none
`--delimiter <CH>`	Force the input field delimiter	auto by extension
`--coef <NAME\|IDX>`	Coefficient/contrast to tabulate (moderated-t table)	last coef
`--f-test`	Tabulate the overall moderated F instead	`false`
`--treat <LFC>`	Run TREAT against an absolute log2-FC threshold	none
`--trend` / `--robust`	Intensity trend / robust eBayes hyperparameters	`false`
`--proportion <P>`	Assumed proportion of DE genes (eBayes)	`0.01`
`--sort <KEY>`	`logFC` \| `AveExpr` \| `P` \| `t` \| `B` \| `none`	`B`
`--number <N>`	Maximum rows to emit	all
`--out <FILE>`	Write the ranked top table (CSV)	stdout
`--write-fit <FILE>`	`write.fit`-style table, all coefficients (TSV)	none
`--decide <FILE>`	`decideTests` matrix (-1/0/1 per gene per contrast)	none

Run limma --help for the full list.

Validation & benchmarks

Every ported function is checked against the installed R limma package: file-based R scripts under reference/ dump per-(gene, coefficient) reference values from the original package, and the Rust test suite asserts the port reproduces them. The end-to-end pipeline is additionally validated and benchmarked on real datasets. Full methodology, datasets, and honest caveats are in benchmarks/REPORT.md.

Numerical parity. The worst-case relative error vs Bioconductor limma is 4.5e-08 (on lods, the cancellation-prone log-odds B-statistic); most statistics agree to ~1e-10, and p-values to ~1e-11. Across every dataset, decideTests agreement is 100% with zero discordant significant calls, cross-checked exactly against limma's own decideTests.

Speed & memory (median, vs R 4.6.0 / limma 3.68.3 on a 16-core Windows 11 machine; compute only, IO excluded):

Workload	Shape (features × samples)	Pipeline	limma-rust vs R
Microarray	12,048 × 37	`lmFit -> eBayes`	2.0×
Microarray	11,623 × 80	`lmFit -> eBayes`	2.6×
Real RNA-seq (human lung)	16,413 × 204	`voom -> lmFit -> eBayes`	7.5×
Synthetic RNA-seq	11,996 × 12	`voom -> lmFit -> eBayes`	15.4×
`camera` (gene-set test)	7,919 × 7	competitive set test	3.0×
`roast`/`mroast`/`romer`	7,919 × 7	rotation set tests	7-25×
`duplicateCorrelation`	5,861 × 36	per-gene REML	243×

Peak resident memory is 4.8-11.8× below R across a 2k-32k-gene / 12-384-sample sweep (whole-process RSS; R's ~110 MB interpreter-plus-limma baseline inflates the ratio on small inputs, settling to ~5× once the data dominates).

Where R still wins. On a wide, unweighted design (181 genes against 1,335 samples), R is ~3× faster: it factorises the shared design once and applies it to every gene via batched LAPACK, whereas the port solves per gene. That is the one documented regime where the reference's batched linear algebra wins; it is a trade-off, not a correctness gap.

Function coverage

limma-rust ports limma's statistical core and most of the analyses built on top of it. The table tracks per-area status against the upstream R source; see reference/inventory.tsv for the full list.

Area	limma functions	Status
Linear model fit	`lmFit`, `lm.series`, `mrlm`, `gls.series`, `nonEstimable`, `is.fullrank`	ported
Contrasts	`contrasts.fit`, `makeContrasts`	ported
Empirical Bayes	`eBayes` (incl. `trend`/`robust`), `squeezeVar`, `fitFDist`, `fitFDistRobustly`, `fitFDistUnequalDF1`, `tmixture.matrix`	ported
Lowess / loess	`loessFit`, `weightedLowess`, `tricubeMovingAverage`	ported
Result tables	`topTable`, `topTableF`, `toptable`, BH/BY adjust, `write.fit`	ported
Decision tests	`decideTests` (separate/global/hierarchical/nestedF), `classifyTestsF`, `p.adjust` (none/bonferroni/holm/BH/BY)	ported
TREAT	`treat`, `topTreat`	ported
RNA-seq weights	`voom`, `voomWithQualityWeights`, `vooma`, `voomaByGroup`, `voomaLmFit`, `arrayWeights`, `printtipWeights`, `beadCountWeights`	ported
Correlation / weights	`duplicateCorrelation`, `interGeneCorrelation`, `removeBatchEffect`	ported
Normalization	`normalizeBetweenArrays`, `normalizeQuantiles`, `normalizeCyclicLoess`, `normalizeWithinArrays`, `MA.RG`/`RG.MA`	ported
Background correction	`backgroundCorrect`, `normexp.fit` (saddle-point MLE), `nec`, `neqc`, `ma3x3`	ported
Gene-set tests	`camera`, `cameraPR`, `fry`, `geneSetTest`, `wilcoxGST`, `ids2indices`	ported
Gene-set tests (rotation)	`roast`, `mroast`, `romer`, `topRomer` (bit-exact Mersenne-Twister RNG)	ported
Exon / splicing	`diffSplice`, `topSplice`, `genas`, `wsva`	ported
Two-color helpers	`exprs.MA`, `designI2A`, `designI2M`, `lmscFit`	ported
External-numerics norm	`normalizeVSN`, `normalizeRobustSpline`, `normalizeForPrintorder`, `kooperberg`, `intraspotCorrelation`	out of scope
Microarray IO readers	`read.maimages`, `read.ilmn`, ...	out of scope
Containers / S4 methods	`RGList`/`MAList`/`EList` classes, `cbind`/`rbind`/`merge`/`dim`	out of scope
Plotting	`plotMD`, `plotMDS`, `volcanoplot`, `venn`, ...	out of scope
Annotation	`goana`, `kegga`	out of scope

R to limma-rust name map

Names follow Rust's snake_case; the mapping is otherwise one-to-one. A few common entry points:

R (limma)	limma-rust
`lmFit`	`lmfit`, `lmfit_weighted`
`contrasts.fit` / `makeContrasts`	`contrasts_fit` / `make_contrasts`
`eBayes` / `treat`	`ebayes` / `treat`
`topTable` / `topTableF`	`top_table` / `top_table_f`
`decideTests`	`decide_tests`
`voom` / `voomWithQualityWeights`	`voom` / `voom_with_quality_weights`
`duplicateCorrelation`	`duplicate_correlation`
`removeBatchEffect`	`remove_batch_effect`
`camera` / `cameraPR`	`camera` / `camera_pr`
`roast` / `mroast` / `romer`	`roast` / `mroast` / `romer`
`normalizeBetweenArrays`	`normalize_between_arrays`

Building from source

cargo build --release              # default: parallel + faer + CLI

cargo test                         # 229 parity tests + CLI integration test

cargo build --no-default-features  # minimal pure-ndarray library

How this port was produced

limma-rust was written with AI assistance (Anthropic's Claude, via Claude Code); the Co-Authored-By trailers in the git history record this. Its outputs are checked numerically against the upstream package, as described under Validation & benchmarks. The limma source is fetched locally to reference/limma-src/ for study (via reference/get_source.R) and is not vendored into this repository.

License

limma is licensed GPL (>= 2). As a derivative port, limma-rust is distributed under GPL-3.0-or-later; the full license text is in LICENSE.

Credits & citation

limma is the work of Gordon K. Smyth and colleagues. This port reimplements their algorithms; it does not originate them. If you use limma-rust in research, please cite the original limma paper:

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47. doi:10.1093/nar/gkv007

limma-rust 0.1.0