Crate speedytree

source ·
Expand description

Canonical and RapidNJ implementations of Neighbor-joining in Rust

Speedytree is a Rust implementation of Neighbor-Joining for building phylogenetic trees from large Phylip distance matrices.

There are two strategies: the Canonical algorithm (as QuickTree) and something in the spirit of RapidNJ but with B-trees. You can read more about Neighbor-Joining here. The RapidNJ algorithm should be faster for very big problems at the cost of a larger memory overhead.

A command line application (that reads PHYLIP distance matrix) is also provided. Please, read more in the GitHub repository. You will also find a few slides I made there.

§Example

A minimal example of the library is provided here. You can read more about the command line app by running speedytree -h

use speedytree::DistanceMatrix;
use speedytree::robinson_foulds;
use speedytree::{NeighborJoiningSolver, Canonical, RapidBtrees, Hybrid};
use rayon;
// Raw Phylip format
let input = "5
   a    0    5    9    9    8
   b    5    0    10    10    9
   c    9    10    0    8    7
   d    9    10    8    0    3
   e    8    9    7    3    0
".as_bytes();
let d = DistanceMatrix::read_from_phylip(input).unwrap();
// Canonical Neighbor Joining. Optimal for small problems.
let tree1 = NeighborJoiningSolver::<Canonical>::default(d.clone())
  .solve()
  .unwrap();
// An algorithm in the spirit of RapidNJ using B-trees. Optimal for big problems.
let tree2 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
  .solve()
  .unwrap();
assert_eq!(robinson_foulds(&tree1, &tree2), 0);

// You can improve the speed a lot by using multiple threads (see rayon::ThreadPoolBuilder::new())
// If so, you may want to tune the chunk size every worker uses
  let tree3 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
  .set_chunk_size(4)
  .solve()
  .unwrap();
// An algorithm that starts using RapidBtrees, later uses Canonical
// Optimal for intermediate problems (only if you tune the number of "canonical steps")
let tree4 = NeighborJoiningSolver::<Hybrid>::default(d.clone())
  .set_canonical_steps(2)
  .solve()
  .unwrap();
assert_eq!(robinson_foulds(&tree3, &tree4), 0);

Structs§

  • Canonical Neighbor-Joining, similar to QuickTree. It runs on cubic time (worst and best case). It uses quadratic memory.
  • Distance matrix data structure
  • A mix of the Canonical and RapidBtrees. First, it starts with RapidBtrees (less lookups, but with an overhead), and then it changes the strategy.
  • Generic solver
  • In the spirit of RapidNJ, but with B-trees. It runs on n^2 log(n) time best case and cubic time worst case. It uses quadratic memory (with a higher constant).

Functions§

Type Aliases§

  • An undirected network built in top of Petgraph. Internal nodes have empty names.