rsomics-tree-tipdist
Patristic tip-to-tip distance matrix from a phylogenetic tree: for every pair
of tips, the sum of branch lengths along the path connecting them. Output is a
square, labelled, symmetric distance matrix in scikit-bio DistanceMatrix
(lsmat) TSV form, byte-identical to skbio.
rsomics-tree-tipdist tree.nwk -o dm.tsv
cat tree.nwk | rsomics-tree-tipdist --count # count branches, not lengths
Tip labels are emitted in postorder tip order, exactly as skbio's
TreeNode.cophenet. Missing branch lengths are treated as 0. This is the
natural input producer for rsomics-pcoa / rsomics-permanova.
This is the phylogenetic cophenetic (patristic) distance. It is distinct from
rsomics-cophenet, which computes the scipy linkage-matrix cophenetic distance
of a hierarchical clustering — different algorithm, different input.
Origin
This crate is an independent Rust reimplementation of scikit-bio's
skbio.tree.TreeNode.cophenet / TreeNode.tip_tip_distances (skbio 0.7.2),
based on:
- The published patristic-distance method: Fourment, M., & Gibbs, M. J. (2006). PATRISTIC: a program for calculating patristic distances. BMC Evolutionary Biology, 6, 1. https://doi.org/10.1186/1471-2148-6-1
- The scikit-bio BSD-3-Clause source (
skbio/tree/_tree.py,skbio/io/format/lsmat.py), which is permissively licensed and may be read and cited. The accumulation order and the lsmat TSV float formatting (CPythonreprshortest round-trip) were matched against it for byte-exact output.
Test fixtures are independently generated and verified against the upstream binary.
License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause).