Crate speedytree
source ·Expand description
Canonical and RapidNJ implementations of Neighbor-joining in Rust
Speedytree is a Rust implementation of Neighbor-Joining for building phylogenetic trees from large Phylip distance matrices.
There are two strategies: the Canonical algorithm (as QuickTree) and something in the spirit of RapidNJ but with B-trees. You can read more about Neighbor-Joining here. The RapidNJ algorithm should be faster for very big problems at the cost of a larger memory overhead.
A command line application (that reads PHYLIP distance matrix) is also provided. Please, read more in the GitHub repository. You will also find a few slides I made there.
§Example
A minimal example of the library is provided here. You can read more about the command line app by running speedytree -h
use speedytree::DistanceMatrix;
use speedytree::robinson_foulds;
use speedytree::{NeighborJoiningSolver, Canonical, RapidBtrees, Hybrid};
use rayon;
// Raw Phylip format
let input = "5
a 0 5 9 9 8
b 5 0 10 10 9
c 9 10 0 8 7
d 9 10 8 0 3
e 8 9 7 3 0
".as_bytes();
let d = DistanceMatrix::read_from_phylip(input).unwrap();
// Canonical Neighbor Joining. Optimal for small problems.
let tree1 = NeighborJoiningSolver::<Canonical>::default(d.clone())
.solve()
.unwrap();
// An algorithm in the spirit of RapidNJ using B-trees. Optimal for big problems.
let tree2 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
.solve()
.unwrap();
assert_eq!(robinson_foulds(&tree1, &tree2), 0);
// You can improve the speed a lot by using multiple threads (see rayon::ThreadPoolBuilder::new())
// If so, you may want to tune the chunk size every worker uses
let tree3 = NeighborJoiningSolver::<RapidBtrees>::default(d.clone())
.set_chunk_size(4)
.solve()
.unwrap();
// An algorithm that starts using RapidBtrees, later uses Canonical
// Optimal for intermediate problems (only if you tune the number of "canonical steps")
let tree4 = NeighborJoiningSolver::<Hybrid>::default(d.clone())
.set_canonical_steps(2)
.solve()
.unwrap();
assert_eq!(robinson_foulds(&tree3, &tree4), 0);
Structs§
- Canonical Neighbor-Joining, similar to QuickTree. It runs on cubic time (worst and best case). It uses quadratic memory.
- Distance matrix data structure
- A mix of the Canonical and RapidBtrees. First, it starts with RapidBtrees (less lookups, but with an overhead), and then it changes the strategy.
- Generic solver
- In the spirit of RapidNJ, but with B-trees. It runs on n^2 log(n) time best case and cubic time worst case. It uses quadratic memory (with a higher constant).
Functions§
- Calculate the Branch-Score distance between two trees. It takes the branch length into account.
- Calculate the Robinson-Foulds distance between two trees. It doesn’t take branch length into account.
- Convert a
Tree
to a string according to the Newick format
Type Aliases§
- An undirected network built in top of Petgraph. Internal nodes have empty names.