# rsomics-gradient-trajectory
Gradient/trajectory ANOVA over ordination coordinates — the QIIME-style
microbiome trajectory analysis of `skbio.stats.gradient`, as a single fast
binary. Given a precomputed ordination (e.g. PCoA), the per-axis proportion
explained, and sample metadata, it builds a trajectory through the ordination
space for each group in a category and runs one-way ANOVA to test whether the
groups differ.
This crate analyses trajectories **through** an ordination; it does not compute
the ordination itself.
## Install
```
cargo install rsomics-gradient-trajectory
```
## Usage
```
rsomics-gradient-trajectory coords.tsv \
--prop prop.tsv \
--metadata meta.tsv \
--algorithm trajectory \
--trajectory-categories Treatment \
--sort-category Time \
--axes 3
```
- `coords.tsv` — samples × PC axes (`#id` header then one row per sample), or
stdin with `-`.
- `--prop` — proportion-explained vector, one value per axis.
- `--metadata` — sample metadata (`#SampleID` header then id + columns).
- `--algorithm` — `trajectory` (RMS, default), `average`, `first-difference`,
or `window-difference`.
- `--sort-category` — metadata column whose value orders samples within a group
(the gradient axis, e.g. time). Natural-sorted exactly as scikit-bio does.
- `--trajectory-categories` — comma-separated categories to analyse (all if
omitted).
- `--axes` — number of PC axes to use (default 3).
- `--weighted` — weight trajectories by spacing in the numeric sort category.
- `--window-size` — window for `window-difference` (default 3).
- `--csv` — parse inputs as comma-separated.
Output is one block per category: a line with the ANOVA probability (or the
skip message when ANOVA cannot run), then a line per group with its mean and
trajectory components.
## Origin
This crate is a Rust reimplementation of scikit-bio's
`skbio.stats.gradient` (the `GradientANOVA` family — RMS trajectory, RMS
average, first-difference, and windowed-difference algorithms) and the
closed-form one-way ANOVA of `scipy.stats.f_oneway`. scikit-bio and scipy are
BSD-licensed, so their source was read and is cited:
- Method: ordination-trajectory gradient analysis, as in the QIIME 1 pipeline
and the microbiome "movement through ordination space" framework.
Caporaso et al., *QIIME allows analysis of high-throughput community
sequencing data*, Nat. Methods 7:335-336 (2010), DOI 10.1038/nmeth.f.303;
and the QIIME-2 microbiome study companion, *Gigascience* 2:16 (2013),
DOI 10.1186/2047-217X-2-16.
- `skbio.stats.gradient` (scikit-bio 0.7.2, BSD-3-Clause) — algorithm semantics.
- `scipy.stats.f_oneway` / `scipy.special.betainc` (scipy, BSD) — the
closed-form ANOVA p-value (F-distribution survival via the regularised
incomplete beta function).
The computation is pure linear algebra (vector norms, means, differences) plus a
closed-form ANOVA: no RNG, no iterative solver. Results are therefore value-exact
against scikit-bio to ~1e-12 (`tests/compat.rs` runs the differential, with a
committed scikit-bio-captured golden as the always-on regression).
License: MIT OR Apache-2.0.
Upstream credit: scikit-bio <https://scikit-bio.org> (BSD-3-Clause),
scipy <https://scipy.org> (BSD-3-Clause).