# rsomics-limma-array-weights
Estimate per-sample (array) quality weights for a log-expression matrix by REML — a clean-room
Rust reimplementation of limma's `arrayWeights`.
Arrays whose expression values follow the linear model poorly across genes get down-weighted; the
weights feed back into a weighted `lmFit`. One weight per array; weights multiply to 1.
## Usage
```
rsomics-limma-array-weights expr.tsv --design design.tsv -o weights.tsv
```
- `expr.tsv` — log-expression matrix, tab-separated. Header row = sample ids; first column = gene ids.
- `--design` — design (model) matrix, tab-separated. Header = coefficient names; first column = sample
ids, one row per sample in the same order as the expression columns.
- output — `sample<TAB>weight`, in input column order.
Flags: `--prior-n` (default 10, squeezes weights toward equality), `--maxiter` (50), `--tol` (1e-5).
## Origin
This crate is an independent Rust reimplementation of limma's `arrayWeights` based on:
- The published method: Ritchie, Diyagama, Neilson, van Laar, Dobrovic, Holloway, Smyth (2006),
"Empirical array quality weights in the analysis of microarray data", BMC Bioinformatics 7:261,
doi:10.1186/1471-2105-7-261; with the REML scoring adapted from Smyth (2002), "An efficient
algorithm for REML in heteroscedastic regression", J. Comput. Graph. Statist. 11:836.
- The public TSV input formats.
- Black-box behaviour testing against the upstream binary (`limma::arrayWeights`).
No source code from limma (GPL) was used as reference during implementation.
The variance model is REML-exact: each gene is fit by weighted least squares, a per-array
log-variance offset is shared across genes and estimated by Fisher scoring on the pooled REML
deviance, and the array variances are converted to weights normalised to geometric mean 1. limma's
exact `prior.n` moderation coupling inside the scoring iteration cannot be reproduced bit-exactly
from the published method alone (clean-room: no GPL source), so a small relative residual remains —
under 0.1% on the committed fixture and well under 0.5% on realistic gene/array counts; it grows for
degenerate designs with very few arrays. The compat test asserts agreement to 0.5% relative.
License: MIT OR Apache-2.0.
Upstream credit: limma (https://bioconductor.org/packages/limma/), GPL (>=2).