fastax 1.5.0

Make phylogenetic trees and lineages from the NCBI Taxonomy database
Documentation
Fastax
======

[![crates.io badge](https://img.shields.io/crates/v/fastax?color=green)][4]


Fastax is a command-line tool that makes phylogenetic trees and lineages
from the NCBI Taxonomy database. It uses a local copy of the database,
which makes it really fast.

By default, all results are pretty-printed. In addition, it can output trees
as [Newick][1] and lineages as CSV.

It can also be used to get information about some taxa like there alternative
scientific names or the genetic code they use.

Installation
------------

Fastax is written in [Rust][2], which makes it safe, fast and portable. The
code is managed using [Cargo][3] and published on [crates.io][4]. If Cargo
is already installed, just open a terminal and type:

```
$ cargo install fastax
```

Et voilà !

Alternatively, you can compile it from sources:

```
$ git clone https://github.com/Picani/fastax.git
$ cd fastax
$ cargo build --release
```

The executable file is `target/release/fastax`. Just move it somewhere on
your `PATH`.

Populate the local database
---------------------------

First, you need to get the local copy of the NCBI Taxonomy database.

    $ fastax populate -ve plop@example.com
    
`populate` will download the latest database dumps, extract them, and load
them in a local SQLite database. `-v` asks fastax to tell what it's doing.
`-e` asks to connect to the NCBI with that email address. Note that giving
your email is optional but preferred.

The database is located in a `fastax` folder inside your local data folder,
which should be `$HOME/.local/share`.

Usage
-----

For each command, you need to query at least one node. The term used to get
a node can be either its unique NCBI Taxonomy ID (so called taxid), its
binomial scientific name or its binomial scientific name with the two part
separated by an underscore (the character `_`). This last option is useful
for scripting.

Note also that for some species, multiple binomial scientific names are in
use. Fastax looks for each of them.

### The `show` command

You can get general information about a node:

```
$ fastax show 4932
Saccharomyces cerevisiae - species
----------------------------------
NCBI Taxonomy ID: 4932
Same as:
* Saccharomyces capensis
* Saccharomyces italicus
* Saccharomyces oviformis
* Saccharomyces uvarum var. melibiosus
Commonly named baker's yeast.
Also known as:
* S. cerevisiae
* brewer's yeast
Part of the Plants and Fungi.
Uses the Standard genetic code.
Its mitochondria use the Yeast Mitochondrial genetic code.
```

or:

```
$ fastax show "Homo sapiens"
Homo sapiens - species
----------------------
NCBI Taxonomy ID: 9606
Commonly named human.
Also known as:
* man
First description:
* Homo sapiens Linnaeus, 1758
Part of the Primates.
Uses the Standard genetic code.
Its mitochondria use the Vertebrate Mitochondrial genetic code.
```

or also:

```
$ fastax show Tyrannosaurus_rex
Tyrannosaurus rex - species
---------------------------
NCBI Taxonomy ID: 436495
Part of the Vertebrates.
Uses the Standard genetic code.
Its mitochondria use the Vertebrate Mitochondrial genetic code.
```


### The `lineage` command

You can get the lineage of a node:

```
$ fastax lineage 4932
root
  └┬─ no rank: cellular organisms (taxid: 131567)
   └┬─ superkingdom: Eukaryota (taxid: 2759)
    └┬─ no rank: Opisthokonta (taxid: 33154)
     └┬─ kingdom: Fungi (taxid: 4751)
      └┬─ subkingdom: Dikarya (taxid: 451864)
       └┬─ phylum: Ascomycota (taxid: 4890)
        └┬─ no rank: saccharomyceta (taxid: 716545)
         └┬─ subphylum: Saccharomycotina (taxid: 147537)
          └┬─ class: Saccharomycetes (taxid: 4891)
           └┬─ order: Saccharomycetales (taxid: 4892)
            └┬─ family: Saccharomycetaceae (taxid: 4893)
             └┬─ genus: Saccharomyces (taxid: 4930)
              └── species: Saccharomyces cerevisiae (taxid: 4932)
```

The same lineage in CSV:

```
$ fastax lineage Saccharomyces_cerevisiae
no rank:root:1,no rank:cellular organisms:131567,superkingdom:Eukaryota:2759,no rank:Opisthokonta:33154,kingdom:Fungi:4751,subkingdom:Dikarya:451864,phylum:Ascomycota:4890,no rank:saccharomyceta:716545,subphylum:Saccharomycotina:147537,class:Saccharomycetes:4891,order:Saccharomycetales:4892,family:Saccharomycetaceae:4893,genus:Saccharomyces:4930,species:Saccharomyces cerevisiae:4932
```


### The `tree` command

You can get a phylogenetic tree:

```
$ fastax tree "Escherichia coli" 4932 Drosophila_melanogaster 9606 "Mus musculus"
 ─┬─ no rank: root
  └─┬─ no rank: cellular organisms
    ├─┬─ no rank: Opisthokonta
    │ ├─┬─ no rank: Bilateria
    │ │ ├─┬─ superorder: Euarchontoglires
    │ │ │ ├── species: Mus musculus
    │ │ │ └── species: Homo sapiens
    │ │ └── species: Drosophila melanogaster
    │ └── species: Saccharomyces cerevisiae
    └── species: Escherichia coli
```

The same tree in Newick:

```
$ fastax tree -n 562 4932 7227 9606 10090
(root,(cellular organisms,(Escherichia coli,Opisthokonta,(Saccharomyces cerevisiae,Bilateria,(Drosophila melanogaster,Euarchontoglires,(Homo sapiens,Mus musculus))))));
```

With `-f/--format`, you can also change the default node formatting:

```
$ fastax tree -f "%taxid (%name)" "Escherichia coli" 4932 Drosophila_melanogaster 9606 "Mus musculus"
 ─┬─ 1 (root)
  └─┬─ 131567 (cellular organisms)
    ├─┬─ 33154 (Opisthokonta)
    │ ├─┬─ 33213 (Bilateria)
    │ │ ├─┬─ 314146 (Euarchontoglires)
    │ │ │ ├── 10090 (Mus musculus)
    │ │ │ └── 9606 (Homo sapiens)
    │ │ └── 7227 (Drosophila melanogaster)
    │ └── 4932 (Saccharomyces cerevisiae)
    └── 562 (Escherichia coli)
```

The available tags are

* `%name` which is replaced by the scientific name,
* `%rank` which is replaced by the rank,
* `%taxid` which is replaced by the NCBI Taxonomy ID.

By default, the nodes with only one child are hidden. You can show them with
the `-i/--internal` option:

```
$ fastax tree -i Mus_musculus Rattus_norvegicus
 ─┬─ no rank: root
  └─┬─ no rank: cellular organisms
    └─┬─ superkingdom: Eukaryota
      └─┬─ no rank: Opisthokonta
        └─┬─ kingdom: Metazoa
          └─┬─ no rank: Eumetazoa
            └─┬─ no rank: Bilateria
              └─┬─ no rank: Deuterostomia
                └─┬─ phylum: Chordata
                  └─┬─ subphylum: Craniata
                    └─┬─ no rank: Vertebrata
                      └─┬─ no rank: Gnathostomata
                        └─┬─ no rank: Teleostomi
                          └─┬─ no rank: Euteleostomi
                            └─┬─ superclass: Sarcopterygii
                              └─┬─ no rank: Dipnotetrapodomorpha
                                └─┬─ no rank: Tetrapoda
                                  └─┬─ no rank: Amniota
                                    └─┬─ class: Mammalia
                                      └─┬─ no rank: Theria
                                        └─┬─ no rank: Eutheria
                                          └─┬─ no rank: Boreoeutheria
                                            └─┬─ superorder: Euarchontoglires
                                              └─┬─ no rank: Glires
                                                └─┬─ order: Rodentia
                                                  └─┬─ suborder: Myomorpha
                                                    └─┬─ no rank: Muroidea
                                                      └─┬─ family: Muridae
                                                        └─┬─ subfamily: Murinae
                                                          ├─┬─ genus: Rattus
                                                          │ └── species: Rattus norvegicus
                                                          └─┬─ genus: Mus
                                                            └─┬─ subgenus: Mus
                                                              └── species: Mus musculus
```


### The `subtree` command

You can get the phylogenetic tree of the children of a node:

```
$ fastax subtree Homininae
 ─┬─ subfamily: Homininae
  ├─┬─ genus: Homo
  │ ├── species: Homo heidelbergensis
  │ └─┬─ species: Homo sapiens
  │   ├── subspecies: Homo sapiens subsp. 'Denisova'
  │   └── subspecies: Homo sapiens neanderthalensis
  ├─┬─ genus: Pan
  │ ├─┬─ species: Pan troglodytes
  │ │ ├── subspecies: Pan troglodytes verus x troglodytes
  │ │ ├── subspecies: Pan troglodytes ellioti
  │ │ ├── subspecies: Pan troglodytes vellerosus
  │ │ ├── subspecies: Pan troglodytes verus
  │ │ ├── subspecies: Pan troglodytes troglodytes
  │ │ └── subspecies: Pan troglodytes schweinfurthii
  │ └── species: Pan paniscus
  └─┬─ genus: Gorilla
    ├─┬─ species: Gorilla beringei
    │ ├── subspecies: Gorilla beringei beringei
    │ └── subspecies: Gorilla beringei graueri
    └─┬─ species: Gorilla gorilla
      ├── subspecies: Gorilla gorilla diehli
      ├── subspecies: Gorilla gorilla uellensis
      └── subspecies: Gorilla gorilla gorilla
```

If you only want the species:

```
$ fastax subtree -s Homininae
 ─┬─ subfamily: Homininae
  ├─┬─ genus: Homo
  │ ├── species: Homo heidelbergensis
  │ └── species: Homo sapiens
  ├─┬─ genus: Pan
  │ ├── species: Pan troglodytes
  │ └── species: Pan paniscus
  └─┬─ genus: Gorilla
    ├── species: Gorilla beringei
    └── species: Gorilla gorilla
```

The same tree in newick:

```
$ fastax subtree -sn Homininae
(Homininae,(Homo,(Homo sapiens,Homo heidelbergensis),Gorilla,(Gorilla beringei,Gorilla gorilla),Pan,(Pan paniscus,Pan troglodytes)));
```

As with the `tree` command, you can format the node with the `-f/--format`
option, and show the internal nodes with the `-i/--internal` option. See
above for more information.

License
-------

Copyright © 2019 Sylvain PULICANI picani@laposte.net

This work is free. You can redistribute it and/or modify it under the terms
of the MIT license. See the `LICENSE` file for more details.


[1]: http://evolution.genetics.washington.edu/phylip/newicktree.html
[2]: https://www.rust-lang.org
[3]: https://crates.io
[4]: https://crates.io/crates/fastax