🌲⚡ rapidtrees
🗺️ Overview
rapidtrees computes pairwise tree distances from BEAST/NEXUS .trees files and writes a labeled distance matrix. Three metrics are supported:
| Metric | Flag | Output | Description |
|---|---|---|---|
| Robinson–Foulds | --metric rf |
integer | Symmetric difference of bipartitions |
| Weighted RF | --metric weighted |
float | Branch-length-weighted bipartition difference |
| Kuhner–Felsenstein | --metric kf |
float | Euclidean distance on branch lengths |
✨ Why rapidtrees?
- 🦀 Rust core — zero-overhead bitset operations with a cache-friendly memory layout
- 🔀 Parallel by default — powered by
rayon, automatically scales across all cores - 🐍 Python bindings — drop into any Python/NumPy workflow via
PyO3 - 📦 No Rust toolchain required — pre-built wheels on PyPI for Linux, macOS, and Windows
- 🗜️ Gzip output — stream directly to
.tsv.gzwithout a separate compression step
🚀 Performance
Benchmarked on a ZIKA dataset (283 taxa · 4 000 trees · ~8 M comparisons):
| Metric | Total time | Throughput |
|---|---|---|
| Robinson-Foulds | ~3.5 s | ~2.3 M comparisons/sec |
| Weighted RF | ~3.5 s | ~2.3 M comparisons/sec |
| Kuhner-Felsenstein | ~3.5 s | ~2.3 M comparisons/sec |
🔧 Installing
🐍 Python (PyPI) — recommended
Pre-built wheels for Linux, macOS, and Windows. No Rust toolchain needed.
🦀 CLI (crates.io)
Install the standalone command-line binary. Requires the Rust toolchain.
🛠️ From source
💻 Usage
| Flag | Description |
|---|---|
-i, --input <INPUT> |
Path to BEAST .trees (NEXUS) file |
-o, --output <OUTPUT> |
Output path. Use .gz suffix for gzip compression; - for stdout |
-t, --burnin-trees <N> |
Drop the first N trees (default: 0) |
-s, --burnin-states <STATE> |
Keep only trees with STATE > STATE (default: 0) |
--use-real-taxa |
Map numeric taxon IDs via the TRANSLATE block |
--metric <rf|weighted|kf> |
Distance metric (default: rf) |
-q, --quiet |
Suppress progress messages (errors still go to stderr) |
The output is a square TSV matrix where both the header row and first column contain tree names formatted as <file_basename>_tree_STATE<state>. Use -o - to write to stdout for easy piping.
💡 Examples
# Reading in beast 0.003s
# Read in 162 taxons for 21 trees
# Creating tree bit snapshots 0.002s
# Determining distances using RF for 210 combinations
# Determining distances using RF 0.000s
# Writing to output 0.000s
# Reading in beast 0.003s
# Read in 162 taxons for 19 trees
# Creating tree bit snapshots 0.001s
# Determining distances using RF for 171 combinations
# Determining distances using RF 0.000s
# Writing to output 0.000s
🐍 Python API
rapidtrees ships Python bindings for seamless integration into Python workflows.
# Robinson-Foulds distances
, =
# Weighted RF (considers branch lengths)
, =
# Kuhner-Felsenstein distances
, =
⏱️ Benchmarks
Benchmarks were run on a MacBook Pro M1. Trees are parsed once and bitset snapshots are reused across all pairwise comparisons. Parallelism is provided by rayon — no manual thread management needed.
| Taxa (N) | Trees (T) | Combinations | Est. Memory | Actual Memory | Wall Time | CPU Time |
|---|---|---|---|---|---|---|
| 10 | 100 | 10.0K | 51.56 KB | 64.00 KB | 391.71 µs | 1.77 ms |
| 10 | 1000 | 1.0M | 515.62 KB | 448.00 KB | 1.16 ms | 9.52 ms |
| 10 | 10000 | 100.0M | 5.04 MB | 5.02 MB | 81.86 ms | 757.66 ms |
| 10 | 100000 | 10.0B | 50.35 MB | 52.42 MB | 8.17 s (est) | 1.27 min (est) |
| 100 | 100 | 10.0K | 481.25 KB | 464.00 KB | 230.54 µs | 1.45 ms |
| 100 | 1000 | 1.0M | 4.70 MB | 1.17 MB | 42.00 ms | 147.22 ms |
| 100 | 10000 | 100.0M | 47.00 MB | 38.42 MB | 1.78 s | 14.75 s |
| 100 | 100000 | 10.0B | 469.97 MB | 451.47 MB | 2.77 min (est) | 25.29 min (est) |
| 500 | 100 | 10.0K | 4.59 MB | 80.00 KB | 1.61 ms | 14.18 ms |
| 500 | 1000 | 1.0M | 45.90 MB | 25.95 MB | 165.58 ms | 1.46 s |
| 500 | 10000 | 100.0M | 458.98 MB | 399.44 MB | 19.75 s | 2.87 min |
| 500 | 100000 | 10.0B | 4.48 GB | 4.51 GB | 31.75 min (est) | 294.98 min (est) |
| 1000 | 100 | 10.0K | 15.27 MB | 5.30 MB | 4.69 ms | 44.25 ms |
| 1000 | 1000 | 1.0M | 152.71 MB | 126.77 MB | 660.03 ms | 5.30 s |
| 1000 | 10000 | 100.0M | 1.49 GB | 1.44 GB | 1.13 min | 9.72 min |
| 1000 | 100000 | 10.0B | 14.91 GB | 9.41 GB | 111.42 min (est) | 983.70 min (est) |
| 2000 | 100 | 10.0K | 54.94 MB | 34.56 MB | 23.11 ms | 182.28 ms |
| 2000 | 1000 | 1.0M | 549.44 MB | 480.02 MB | 2.89 s | 19.48 s |
| 2000 | 10000 | 100.0M | 5.37 GB | 5.10 GB | 3.92 min | 34.76 min |
| 2000 | 100000 | 10.0B | 53.66 GB | Skipped (>30GB) | - | - |
| 5000 | 100 | 10.0K | 316.63 MB | 299.31 MB | 270.13 ms | 2.09 s |
| 5000 | 1000 | 1.0M | 3.09 GB | 3.06 GB | 27.61 s | 4.05 min |
| 5000 | 10000 | 100.0M | 30.92 GB | Skipped (>30GB) | - | - |
| 5000 | 100000 | 10.0B | 309.21 GB | Skipped (>30GB) | - | - |
Note: Weighted RF and KF produce floating-point matrices; RF produces integer matrices.
🔍 Troubleshooting
- No trees parsed? Verify the input is a valid NEXUS
.treesfile and adjust--burnin-*settings. - Piping to other tools? Use
-qto suppress timing messages on stdout. - Gzipped output not working? Ensure the output filename ends with
.gz.
⚖️ License
rapidtrees is provided under the MIT License.