Matchtigs & Eulertigs: minimum plain text representation of kmer sets - with and without repetitions
This is an implementation of different algorithms for computing small and minimum plain text representations of kmer sets. The algorithms expect unitigs as an input, which can e.g. be computed with BCALM2.
Features
- Compute matchtigs and greedy matchtigs with multiple threads
- Compute Eulertigs
- Compute pathtigs (heuristical Eulertigs similar to ProphAsm)
- Both fasta and GFA format supported, as well as gzip compression if the files end in
.gz - The annotations in a fasta file output by bcalm2 can be used to speed up loading (use
--bcalm-ininstead of--fasta-in) - Output (ASCII-) bitvectors of duplicate kmers for applications that require unique kmers
Installation via conda/mamba
Install matchtigs with
Installation via cargo
Requirements
Rust >= 1.60.0, best installed via rustup.
Installation
Install matchtigs with
Usage
Computing matchtigs and greedy matchtigs from a fasta file and saving them as GFA (without topology):
Computing Eulertigs from a GFA file and saving them as both GFA (without topology) and fasta:
Note: when computing unitigs with bcalm2, it is much faster to use --bcalm-in:
Use the --help option to get an overview of available options.
Citation
matchtigs preprint
Schmidt, S., Khan, S., Alanko, J., & Tomescu, A. I. (2021). Matchtigs: minimum plain text representation of kmer sets. bioRxiv. https://doi.org/10.1101/2021.12.15.472871.
Eulertigs (WABI 2022 best paper award)
Schmidt, S. and Alanko, J. (2022). Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time. WABI 2022. 10.4230/LIPIcs.WABI.2022.2.