rsomics-limma-array-weights 0.1.0

Estimate per-sample (array) quality weights by REML for a log-expression matrix + design — a clean-room Rust reimplementation of limma's arrayWeights
Documentation

rsomics-limma-array-weights

Estimate per-sample (array) quality weights for a log-expression matrix by REML — a clean-room Rust reimplementation of limma's arrayWeights.

Arrays whose expression values follow the linear model poorly across genes get down-weighted; the weights feed back into a weighted lmFit. One weight per array; weights multiply to 1.

Usage

rsomics-limma-array-weights expr.tsv --design design.tsv -o weights.tsv
  • expr.tsv — log-expression matrix, tab-separated. Header row = sample ids; first column = gene ids.
  • --design — design (model) matrix, tab-separated. Header = coefficient names; first column = sample ids, one row per sample in the same order as the expression columns.
  • output — sample<TAB>weight, in input column order.

Flags: --prior-n (default 10, squeezes weights toward equality), --maxiter (50), --tol (1e-5).

Origin

This crate is an independent Rust reimplementation of limma's arrayWeights based on:

  • The published method: Ritchie, Diyagama, Neilson, van Laar, Dobrovic, Holloway, Smyth (2006), "Empirical array quality weights in the analysis of microarray data", BMC Bioinformatics 7:261, doi:10.1186/1471-2105-7-261; with the REML scoring adapted from Smyth (2002), "An efficient algorithm for REML in heteroscedastic regression", J. Comput. Graph. Statist. 11:836.
  • The public TSV input formats.
  • Black-box behaviour testing against the upstream binary (limma::arrayWeights).

No source code from limma (GPL) was used as reference during implementation.

The variance model is REML-exact: each gene is fit by weighted least squares, a per-array log-variance offset is shared across genes and estimated by Fisher scoring on the pooled REML deviance, and the array variances are converted to weights normalised to geometric mean 1. limma's exact prior.n moderation coupling inside the scoring iteration cannot be reproduced bit-exactly from the published method alone (clean-room: no GPL source), so a small relative residual remains — under 0.1% on the committed fixture and well under 0.5% on realistic gene/array counts; it grows for degenerate designs with very few arrays. The compat test asserts agreement to 0.5% relative.

License: MIT OR Apache-2.0. Upstream credit: limma (https://bioconductor.org/packages/limma/), GPL (>=2).