ATGlib
ATGlib is a Rust library to work with genomic transcript data. It handles several file formats, such as GTF, GenePred(ext) and Refgene. You can generate bed files, fasta sequences or custom feature sequences.
ATGlib is useful for Bioinformaticians and Geneticists. It can be used for all kind of data processing workflows that must handle transcript annotations. The main use case are conversions between various file formats and sanity checks for large scale QC. It is primarily targeted for human genome, but should work equally well with all other genomes.
If you are looking for an actual application, or a command line tool to work with transcripts, GTF files etc, use ATG instead. It is using ATGlib behind the scenes and provides a simple to use interface.
Documentation
The library API is mostly documented inline and available on docs.rs
Examples
For examples on how to use ATGlib on a high-level, you can check the source code of the CLI tool ATG.
Convert GTF to RefGene
use Reader;
use Writer;
use ;
let mut reader = from_file
.unwrap_or_else;
let mut writer = from_file
.unwrap_or_else;
let transcripts = reader.transcripts
.unwrap_or_else;
match writer.write_transcripts ;
Work with transcripts directly
use standard_transcript;
let tx = standard_transcript;
println!;
if tx.is_coding else
Limitations
I started ATGlib as a private side project to learn Rust
. I'm sure there are many parts of the code that are not idiomatic and can be improved. I'd be more than happy to receive feedback and suggestions for improvement. I also encourage everyone, who is interested, to help and contribute the ATGlib.
When it comes to functional correctness, I try my best to test the functionality on all levels. ATGlib has a very good test coverage and I also run manual checks before every release with a huge test-set. However, I cannot guarantee that everything is correct, so please use it at your own risk and report any bugs and issues via Github.
ToDo / Next tasks
- Add function to compare transcripts from two different inputs
- use Smartstring or Smallstr for gene-symbol, transcript name and chromosome
- Parallelize input parsing
- Check if exons can be stored in smaller vec
- Use std::mem::replace to move out of attributes, e.g. in TranscriptBuilder and remove Copy/Clone traits https://stackoverflow.com/questions/31307680/how-to-move-one-field-out-of-a-struct-that-implements-drop-trait
- Change
Codon
toGenomicCodon
- Update error handling and streamling error types
Known issues
GTF parsing
- NM_001371720.1 has two book-ended exons (155160639-155161619 || 155161620-155162101). During input parsing, book-ended features are merged into one exon