Clade
A tool for phylogenetic tree construction and pruning based on NCBI taxonomy data and GTDB data.
Features
- Fetch and process NCBI taxonomy data
- Fetch and process GTDB (Genome Taxonomy Database) data
- Parse taxonomy data into efficient vector structures
- Prune phylogenetic trees based on user input
- Generate Newick format output from pruned trees
Workflow
- Data Retrieval:
- Fetch the latest taxonomy data from NCBI
- Fetch the latest tree and taxonomy data from GTDB
- Data Processing:
- Decompress the downloaded data
- Parse the taxonomy information into five synchronized vectors:
- taxid vector
- parentid vector
- name vector
- rank vector
- parent_distances vector
- Each vector has the same length, and their indices correspond to each other
- Tree Pruning and Newick Generation:
- Accept user input in the form of:
- A list of taxids, or
- A list of taxonomic names
- Prune the phylogenetic tree to include only the branches related to the input
- Generate a Newick format file representing the pruned phylogenetic tree
- Accept user input in the form of:
Usage
The Clade tool supports the following commands:
update: Update taxdump files from NCBIgenerate: Generate and print taxonomy summary from taxdump filesprune: Prune the taxonomy tree based on given taxids or names and generate a Newick format filegtdb: Download and process GTDB data
Example usage: