rsomics-limma-array-weights 0.1.0

Estimate per-sample (array) quality weights by REML for a log-expression matrix + design — a clean-room Rust reimplementation of limma's arrayWeights
Documentation
# rsomics-limma-array-weights

Estimate per-sample (array) quality weights for a log-expression matrix by REML — a clean-room
Rust reimplementation of limma's `arrayWeights`.

Arrays whose expression values follow the linear model poorly across genes get down-weighted; the
weights feed back into a weighted `lmFit`. One weight per array; weights multiply to 1.

## Usage

```
rsomics-limma-array-weights expr.tsv --design design.tsv -o weights.tsv
```

- `expr.tsv` — log-expression matrix, tab-separated. Header row = sample ids; first column = gene ids.
- `--design` — design (model) matrix, tab-separated. Header = coefficient names; first column = sample
  ids, one row per sample in the same order as the expression columns.
- output — `sample<TAB>weight`, in input column order.

Flags: `--prior-n` (default 10, squeezes weights toward equality), `--maxiter` (50), `--tol` (1e-5).

## Origin

This crate is an independent Rust reimplementation of limma's `arrayWeights` based on:

- The published method: Ritchie, Diyagama, Neilson, van Laar, Dobrovic, Holloway, Smyth (2006),
  "Empirical array quality weights in the analysis of microarray data", BMC Bioinformatics 7:261,
  doi:10.1186/1471-2105-7-261; with the REML scoring adapted from Smyth (2002), "An efficient
  algorithm for REML in heteroscedastic regression", J. Comput. Graph. Statist. 11:836.
- The public TSV input formats.
- Black-box behaviour testing against the upstream binary (`limma::arrayWeights`).

No source code from limma (GPL) was used as reference during implementation.

The variance model is REML-exact: each gene is fit by weighted least squares, a per-array
log-variance offset is shared across genes and estimated by Fisher scoring on the pooled REML
deviance, and the array variances are converted to weights normalised to geometric mean 1. limma's
exact `prior.n` moderation coupling inside the scoring iteration cannot be reproduced bit-exactly
from the published method alone (clean-room: no GPL source), so a small relative residual remains —
under 0.1% on the committed fixture and well under 0.5% on realistic gene/array counts; it grows for
degenerate designs with very few arrays. The compat test asserts agreement to 0.5% relative.

License: MIT OR Apache-2.0.
Upstream credit: limma (https://bioconductor.org/packages/limma/), GPL (>=2).