[](https://github.com/rderelle/kamino/actions/workflows/ci.yml)
[](https://github.com/rderelle/kamino/actions/workflows/clippy.yml)
[](https://codecov.io/github/rderelle/kamino)
[](https://crates.io/crates/kamino-cli)
[](https://bioconda.github.io/recipes/kamino/README.html)
<br><br>
<p align="center">
<img src="logo_kamino.svg" alt="kamino logo" width="400">
</p>
<br><br>
From the Spanish word for *path*.
Builds an amino-acid alignment in a reference-free, alignment-free manner from a set of proteomes.
Not ‘better’ than traditional marker-based pipelines, but simpler and faster to run.
Typical usages range from between-species to within-phylum phylogenetic analyses (bacteria, archaea and eukaryotes).
<br>
---
## under the hood
kamino performs the following successive steps:
- lists proteome files from the input directory (-i or -I)
- recodes proteins with a 6-letters recoding scheme (-r)
- simplifies proteomes by discarding *non-conserved* proteins
- simplifies proteomes by discarding out-branching k-mers
- builds a global assembly graph and identifies variant groups as described <a href="https://academic.oup.com/mbe/article/42/4/msaf077/8103706">here</a> (-d)
- converts variant group paths back to amino acids using a sliding window
- mask long polymorphism runs within variant groups (-m)
- filters variant groups by missing data and middle-length thresholds (-f and -l)
- extracts middle positions and incorporate 'constant' positions (-c)
- outputs the final amino acid alignment (-o)
---
## installation
You can either compile the code locally using rustc, or install a precompiled binary from Bioconda:
```bash
conda install bioconda::kamino
```
---
## running kamino
Input consists of proteome files in FASTA format (gzipped or not), with one file per sample. Files can be placed in a single directory (specified with the -i argument), or their paths can be provided in a tab-delimited file using -I.
A basic run using four threads can be performed with either of the following commands:
```bash
kamino -i <input_dir> -t 4
kamino -I <tabular_file> -t 4
```
---
## examples
All analyses were performed on a MacBook "M4 Pro" using v0.4.0 and 4 threads (other parameters set to default unless specified):
| 50 *Mycobacterium* | within-genera | 0.1 | 2 | 19,283 |
| 400 *Mycobacterium* | within-genera | 0.9 | 8 | 13,753 |
| 50 Polyporales (fungi) | within-order | 0.5 | 8 | 21,808 |
| 46 *Drosophila* | within-genera | 0.7 | 7 | 194,021 |
| 55 Mammalia | within-class | 1.6 | 14 | 291,437 |
| 55 Mammalia `-k 13` | within-class | 1.9 | 8 | 191,962 |
---
## FAQ
- **When not to use kamino?**
* low diversity datasets (ie, within-species), for which genome-based approaches will be more powerful
* very large datasets (eg, thousands of bacterial proteomes or hundreds of vertebrate proteomes)
* very divergent datasets (eg, animal kingdom)
* distant outgroup composed of a few isolates: these might have disproportionately more missing data
* list to be completed ...
- **Is the output reproducible?**
<p>Yes, kamino is fully deterministic so will produce the exact same alignment for a given version, set of parameters and input proteomes.</p>
- **How to get more phylogenetic positions?**
<p>Iincrease the maximum depth of the graph traversal (-d) or lower the minimum proportion of isolates with amino acid per position (-m) if that is acceptable for downstream analyses.</p>
---
This codebase is provided under the MIT License. Some parts of the code were drafted using AI assistance.