rsomics-gradient-trajectory
Gradient/trajectory ANOVA over ordination coordinates — the QIIME-style
microbiome trajectory analysis of skbio.stats.gradient, as a single fast
binary. Given a precomputed ordination (e.g. PCoA), the per-axis proportion
explained, and sample metadata, it builds a trajectory through the ordination
space for each group in a category and runs one-way ANOVA to test whether the
groups differ.
This crate analyses trajectories through an ordination; it does not compute the ordination itself.
Install
cargo install rsomics-gradient-trajectory
Usage
rsomics-gradient-trajectory coords.tsv \
--prop prop.tsv \
--metadata meta.tsv \
--algorithm trajectory \
--trajectory-categories Treatment \
--sort-category Time \
--axes 3
coords.tsv— samples × PC axes (#idheader then one row per sample), or stdin with-.--prop— proportion-explained vector, one value per axis.--metadata— sample metadata (#SampleIDheader then id + columns).--algorithm—trajectory(RMS, default),average,first-difference, orwindow-difference.--sort-category— metadata column whose value orders samples within a group (the gradient axis, e.g. time). Natural-sorted exactly as scikit-bio does.--trajectory-categories— comma-separated categories to analyse (all if omitted).--axes— number of PC axes to use (default 3).--weighted— weight trajectories by spacing in the numeric sort category.--window-size— window forwindow-difference(default 3).--csv— parse inputs as comma-separated.
Output is one block per category: a line with the ANOVA probability (or the skip message when ANOVA cannot run), then a line per group with its mean and trajectory components.
Origin
This crate is a Rust reimplementation of scikit-bio's
skbio.stats.gradient (the GradientANOVA family — RMS trajectory, RMS
average, first-difference, and windowed-difference algorithms) and the
closed-form one-way ANOVA of scipy.stats.f_oneway. scikit-bio and scipy are
BSD-licensed, so their source was read and is cited:
- Method: ordination-trajectory gradient analysis, as in the QIIME 1 pipeline and the microbiome "movement through ordination space" framework. Caporaso et al., QIIME allows analysis of high-throughput community sequencing data, Nat. Methods 7:335-336 (2010), DOI 10.1038/nmeth.f.303; and the QIIME-2 microbiome study companion, Gigascience 2:16 (2013), DOI 10.1186/2047-217X-2-16.
skbio.stats.gradient(scikit-bio 0.7.2, BSD-3-Clause) — algorithm semantics.scipy.stats.f_oneway/scipy.special.betainc(scipy, BSD) — the closed-form ANOVA p-value (F-distribution survival via the regularised incomplete beta function).
The computation is pure linear algebra (vector norms, means, differences) plus a
closed-form ANOVA: no RNG, no iterative solver. Results are therefore value-exact
against scikit-bio to ~1e-12 (tests/compat.rs runs the differential, with a
committed scikit-bio-captured golden as the always-on regression).
License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause), scipy https://scipy.org (BSD-3-Clause).