Crate speedytree

Source
Expand description

Canonical and RapidNJ implementations of Neighbor-joining in Rust

Speedytree is a Rust implementation of Neighbor-Joining for building phylogenetic trees from large Phylip distance matrices.

There are two strategies: the Canonical algorithm (as QuickTree) and something in the spirit of RapidNJ but with B-trees. You can read more about Neighbor-Joining here. The RapidNJ algorithm should be faster for very big problems at the cost of a larger memory overhead.

A command line application (that reads PHYLIP distance matrix) is also provided. Please, read more in the GitHub repository. You will also find a few slides I made there.

§Example

A minimal example of the library is provided here. You can read more about the command line app by running speedytree -h

use speedytree::DistanceMatrix;
use speedytree::robinson_foulds;
use speedytree::{NeighborJoiningSolver, Canonical, RapidBtrees, Hybrid};
use rayon;
// Raw Phylip format
let input = "5
   a    0    5    9    9    8
   b    5    0    10    10    9
   c    9    10    0    8    7
   d    9    10    8    0    3
   e    8    9    7    3    0
".as_bytes();
let d = DistanceMatrix::read_from_phylip(input).unwrap();
// Canonical Neighbor Joining. Optimal for small problems.
let tree1 = NeighborJoiningSolver::<Canonical>::default(d.clone())
  .solve()
  .unwrap();
// An algorithm in the spirit of RapidNJ using B-trees. Optimal for big problems.
let tree2 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
  .solve()
  .unwrap();
assert_eq!(robinson_foulds(&tree1, &tree2), 0);

// You can improve the speed a lot by using multiple threads (see rayon::ThreadPoolBuilder::new())
// If so, you may want to tune the chunk size every worker uses
  let tree3 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
  .set_chunk_size(4)
  .solve()
  .unwrap();
// An algorithm that starts using RapidBtrees, later uses Canonical
// Optimal for intermediate problems (only if you tune the number of "canonical steps")
let tree4 = NeighborJoiningSolver::<Hybrid>::default(d.clone())
  .set_canonical_steps(2)
  .solve()
  .unwrap();
assert_eq!(robinson_foulds(&tree3, &tree4), 0);

Structs§

Canonical
Canonical Neighbor-Joining, similar to QuickTree. It runs on cubic time (worst and best case). It uses quadratic memory.
DistanceMatrix
Distance matrix data structure
Hybrid
A mix of the Canonical and RapidBtrees. First, it starts with RapidBtrees (less lookups, but with an overhead), and then it changes the strategy.
NeighborJoiningSolver
Generic solver
RapidBtrees
In the spirit of RapidNJ, but with B-trees. It runs on n^2 log(n) time best case and cubic time worst case. It uses quadratic memory (with a higher constant).

Functions§

branch_score
Calculate the Branch-Score distance between two trees. It takes the branch length into account.
robinson_foulds
Calculate the Robinson-Foulds distance between two trees. It doesn’t take branch length into account.
to_newick
Convert a Tree to a string according to the Newick format

Type Aliases§

Tree
An undirected network built in top of Petgraph. Internal nodes have empty names.