rsomics-tree-tipdist 0.1.0

Patristic tip-to-tip distance matrix from a phylogenetic tree (sum of branch lengths between every pair of tips) — scikit-bio TreeNode.cophenet equivalent, byte-exact TSV
Documentation

rsomics-tree-tipdist

Patristic tip-to-tip distance matrix from a phylogenetic tree: for every pair of tips, the sum of branch lengths along the path connecting them. Output is a square, labelled, symmetric distance matrix in scikit-bio DistanceMatrix (lsmat) TSV form, byte-identical to skbio.

rsomics-tree-tipdist tree.nwk -o dm.tsv
cat tree.nwk | rsomics-tree-tipdist --count        # count branches, not lengths

Tip labels are emitted in postorder tip order, exactly as skbio's TreeNode.cophenet. Missing branch lengths are treated as 0. This is the natural input producer for rsomics-pcoa / rsomics-permanova.

This is the phylogenetic cophenetic (patristic) distance. It is distinct from rsomics-cophenet, which computes the scipy linkage-matrix cophenetic distance of a hierarchical clustering — different algorithm, different input.

Origin

This crate is an independent Rust reimplementation of scikit-bio's skbio.tree.TreeNode.cophenet / TreeNode.tip_tip_distances (skbio 0.7.2), based on:

  • The published patristic-distance method: Fourment, M., & Gibbs, M. J. (2006). PATRISTIC: a program for calculating patristic distances. BMC Evolutionary Biology, 6, 1. https://doi.org/10.1186/1471-2148-6-1
  • The scikit-bio BSD-3-Clause source (skbio/tree/_tree.py, skbio/io/format/lsmat.py), which is permissively licensed and may be read and cited. The accumulation order and the lsmat TSV float formatting (CPython repr shortest round-trip) were matched against it for byte-exact output.

Test fixtures are independently generated and verified against the upstream binary.

License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause).