limma-rust
limma-rust is a pure-Rust port of the Bioconductor limma package: linear models and empirical-Bayes moderation for differential expression in microarray, RNA-seq, proteomics, and metabolomics data. It reproduces limma 3.68.3's output to 8+ significant figures (checked function by function against R 4.6.0), makes identical differential-expression calls (zero discordant at FDR 5% across six public accessions and four organisms), and runs 2-15× faster at 4.8-11.8× lower peak memory, with no BLAS, LAPACK, Python, or R at runtime.
Disclaimer. limma-rust is an independent, unofficial Rust port and is not affiliated with, endorsed by, or supported by the authors of limma or the Walter and Eliza Hall Institute (WEHI). For the canonical, authoritative implementation, use the Bioconductor limma package.
Highlights
- Function-by-function parity. 229 tests assert that each ported function reproduces Bioconductor limma 3.68.3 (on R 4.6.0) to 8+ significant figures; nothing is accepted unless it matches.
- Validated on real data. End-to-end agreement on six public accessions
across three omics modalities (RNA-seq, proteomics, metabolomics) and four
organisms (human, mouse, fruit fly, fission yeast), with 100%
decideTestsconcordance and zero discordant significant calls. - Bit-exact where it counts. The rotation gene-set tests
(
roast/mroast/romer) reproduce R's Mersenne-Twister draws exactly, so their Monte-Carlo p-values match to the last digit. - Fast, and parallel by default. Pure-Rust SIMD linear algebra
(
faer) plus across-generayonparallelism: 2.0× on microarrays, 7.5× on a 204-sample RNA-seq voom set, 15.4× on synthetic voom, and up to 243× on per-gene workloads likeduplicateCorrelation. - No native dependencies. All linear algebra and special functions are pure
Rust (distributions from
statrs): a single static binary, nothing to link, no R or Python to install. - Library or CLI. Use it as a crate (
use limma::...) or drive the bundledlimmacommand-line tool over CSV.--no-default-featuresbuilds a minimal, dependency-light pure-ndarraylibrary.
Installation
The crate is published as limma-rust; the library is imported as limma
(so use limma::... mirrors R's limma::).
# Library
# CLI binary (installs a `limma` executable)
The default build requires Rust 1.85+. Building --no-default-features (no
faer/rayon/clap) drops that requirement for older toolchains.
Quick start: library
lmFit -> eBayes -> topTable, the canonical limma pipeline, exactly as in R:
use ;
use Array2;
// Expression matrix: 6 genes (rows) × 6 samples (columns), on the log2 scale.
let exprs = from_shape_vec.unwrap;
// Design: 6 samples (rows) × 2 coefficients: intercept + a group-B indicator.
let design = from_shape_vec.unwrap;
let genes: = .map.collect;
let coefs = vec!;
let mut fit = lmfit.unwrap;
ebayes.unwrap;
// Tabulate the "grpB" coefficient (index 1), ranked by the B-statistic.
for row in top_table.unwrap
// g6 (the +3 log2FC gene) tops the table; g3 (down) and g1 (up) tie next;
// the three flat genes fall to the bottom with strongly negative B.
Quick start: CLI
# lmFit -> eBayes -> topTable for one coefficient
# Overall moderated F-statistic across all coefficients
# eBayes with an intensity-dependent variance trend + robust hyperparameters
# TREAT: moderated t against an absolute log2 fold-change threshold of 1.0
# Apply a contrast matrix, then write decideTests calls (BH, p < 0.05)
Input format
Both inputs are delimited text whose first column is an identifier and whose
header names the remaining columns. The delimiter is detected from the file
extension (tab for .tsv / .tab, comma otherwise) and can be forced with
--delimiter (a single character, or one of comma, tab, semicolon,
space):
# exprs.csv: genes × samples (first column = gene id, header = sample names)
gene,s1,s2,s3,s4,s5,s6
g1,5.1,4.9,5.0,7.0,7.2,6.9
g2,3.0,3.1,2.9,3.0,2.8,3.1
# design.csv: samples × coefficients (first column = sample id, header = coef names)
sample,Intercept,grpB
s1,1,0
s4,1,1
Common options
| Option | Description | Default |
|---|---|---|
--exprs <FILE> |
Expression matrix CSV/TSV (genes × samples) | required |
--design <FILE> |
Design matrix CSV/TSV (samples × coefficients) | required |
--contrasts <FILE> |
Contrast matrix CSV/TSV (coefficients × contrasts) | none |
--delimiter <CH> |
Force the input field delimiter | auto by extension |
--coef <NAME|IDX> |
Coefficient/contrast to tabulate (moderated-t table) | last coef |
--f-test |
Tabulate the overall moderated F instead | false |
--treat <LFC> |
Run TREAT against an absolute log2-FC threshold | none |
--trend / --robust |
Intensity trend / robust eBayes hyperparameters | false |
--proportion <P> |
Assumed proportion of DE genes (eBayes) | 0.01 |
--sort <KEY> |
logFC | AveExpr | P | t | B | none |
B |
--number <N> |
Maximum rows to emit | all |
--out <FILE> |
Write the ranked top table (CSV) | stdout |
--write-fit <FILE> |
write.fit-style table, all coefficients (TSV) |
none |
--decide <FILE> |
decideTests matrix (-1/0/1 per gene per contrast) |
none |
Run limma --help for the full list.
Validation & benchmarks
Every ported function is checked against the installed R limma package: file-based
R scripts under reference/ dump per-(gene, coefficient) reference values from the
original package, and the Rust test suite asserts the port reproduces them. The
end-to-end pipeline is additionally validated and benchmarked on real datasets.
Full methodology, datasets, and honest caveats are in
benchmarks/REPORT.md.
Numerical parity. The worst-case relative error vs Bioconductor limma is
4.5e-08 (on lods, the cancellation-prone log-odds B-statistic); most
statistics agree to ~1e-10, and p-values to ~1e-11. Across every dataset,
decideTests agreement is 100% with zero discordant significant calls,
cross-checked exactly against limma's own decideTests.
Speed & memory (median, vs R 4.6.0 / limma 3.68.3 on a 16-core Windows 11 machine; compute only, IO excluded):
| Workload | Shape (features × samples) | Pipeline | limma-rust vs R |
|---|---|---|---|
| Microarray | 12,048 × 37 | lmFit -> eBayes |
2.0× |
| Microarray | 11,623 × 80 | lmFit -> eBayes |
2.6× |
| Real RNA-seq (human lung) | 16,413 × 204 | voom -> lmFit -> eBayes |
7.5× |
| Synthetic RNA-seq | 11,996 × 12 | voom -> lmFit -> eBayes |
15.4× |
camera (gene-set test) |
7,919 × 7 | competitive set test | 3.0× |
roast/mroast/romer |
7,919 × 7 | rotation set tests | 7-25× |
duplicateCorrelation |
5,861 × 36 | per-gene REML | 243× |
Peak resident memory is 4.8-11.8× below R across a 2k-32k-gene / 12-384-sample sweep (whole-process RSS; R's ~110 MB interpreter-plus-limma baseline inflates the ratio on small inputs, settling to ~5× once the data dominates).
Where R still wins. On a wide, unweighted design (181 genes against 1,335 samples), R is ~3× faster: it factorises the shared design once and applies it to every gene via batched LAPACK, whereas the port solves per gene. That is the one documented regime where the reference's batched linear algebra wins; it is a trade-off, not a correctness gap.
Function coverage
limma-rust ports limma's statistical core and most of the analyses built on top of
it. The table tracks per-area status against the upstream R source; see
reference/inventory.tsv
for the full list.
| Area | limma functions | Status |
|---|---|---|
| Linear model fit | lmFit, lm.series, mrlm, gls.series, nonEstimable, is.fullrank |
ported |
| Contrasts | contrasts.fit, makeContrasts |
ported |
| Empirical Bayes | eBayes (incl. trend/robust), squeezeVar, fitFDist, fitFDistRobustly, fitFDistUnequalDF1, tmixture.matrix |
ported |
| Lowess / loess | loessFit, weightedLowess, tricubeMovingAverage |
ported |
| Result tables | topTable, topTableF, toptable, BH/BY adjust, write.fit |
ported |
| Decision tests | decideTests (separate/global/hierarchical/nestedF), classifyTestsF, p.adjust (none/bonferroni/holm/BH/BY) |
ported |
| TREAT | treat, topTreat |
ported |
| RNA-seq weights | voom, voomWithQualityWeights, vooma, voomaByGroup, voomaLmFit, arrayWeights, printtipWeights, beadCountWeights |
ported |
| Correlation / weights | duplicateCorrelation, interGeneCorrelation, removeBatchEffect |
ported |
| Normalization | normalizeBetweenArrays, normalizeQuantiles, normalizeCyclicLoess, normalizeWithinArrays, MA.RG/RG.MA |
ported |
| Background correction | backgroundCorrect, normexp.fit (saddle-point MLE), nec, neqc, ma3x3 |
ported |
| Gene-set tests | camera, cameraPR, fry, geneSetTest, wilcoxGST, ids2indices |
ported |
| Gene-set tests (rotation) | roast, mroast, romer, topRomer (bit-exact Mersenne-Twister RNG) |
ported |
| Exon / splicing | diffSplice, topSplice, genas, wsva |
ported |
| Two-color helpers | exprs.MA, designI2A, designI2M, lmscFit |
ported |
| External-numerics norm | normalizeVSN, normalizeRobustSpline, normalizeForPrintorder, kooperberg, intraspotCorrelation |
out of scope |
| Microarray IO readers | read.maimages, read.ilmn, ... |
out of scope |
| Containers / S4 methods | RGList/MAList/EList classes, cbind/rbind/merge/dim |
out of scope |
| Plotting | plotMD, plotMDS, volcanoplot, venn, ... |
out of scope |
| Annotation | goana, kegga |
out of scope |
R to limma-rust name map
Names follow Rust's snake_case; the mapping is otherwise one-to-one. A few
common entry points:
| R (limma) | limma-rust |
|---|---|
lmFit |
lmfit, lmfit_weighted |
contrasts.fit / makeContrasts |
contrasts_fit / make_contrasts |
eBayes / treat |
ebayes / treat |
topTable / topTableF |
top_table / top_table_f |
decideTests |
decide_tests |
voom / voomWithQualityWeights |
voom / voom_with_quality_weights |
duplicateCorrelation |
duplicate_correlation |
removeBatchEffect |
remove_batch_effect |
camera / cameraPR |
camera / camera_pr |
roast / mroast / romer |
roast / mroast / romer |
normalizeBetweenArrays |
normalize_between_arrays |
Building from source
How this port was produced
limma-rust was written with AI assistance (Anthropic's Claude, via Claude Code);
the Co-Authored-By trailers in the git history record this. Its outputs are
checked numerically against the upstream package, as described under
Validation & benchmarks. The limma source is fetched
locally to reference/limma-src/ for study (via reference/get_source.R) and is
not vendored into this repository.
License
limma is licensed GPL (>= 2). As a derivative port, limma-rust is distributed under
GPL-3.0-or-later; the full license text is in LICENSE.
Credits & citation
limma is the work of Gordon K. Smyth and colleagues. This port reimplements their algorithms; it does not originate them. If you use limma-rust in research, please cite the original limma paper:
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47. doi:10.1093/nar/gkv007