Crate uniprot[][src]

Expand description

Star me

Rust data structures and parser for the UniprotKB database(s).

Actions Codecov License Source Crate Documentation Changelog GitHub issues

🔌 Usage

All parse functions takes a BufRead implementor as the input. Additionaly, if compiling with the threading feature, it will require the input to be Send and 'static as well. They will use the uniprot::Parser, which is either SequentialParser or ThreadedParser depending on the compilation features.

🗄️ Databases

UniProt

The uniprot::uniprot::parse function can be used to obtain an iterator over the entries (uniprot::uniprot::Entry) of a UniprotKB database in XML format (either SwissProt or TrEMBL).

extern crate uniprot;

let f = std::fs::File::open("tests/uniprot.xml")
   .map(std::io::BufReader::new)
   .unwrap();

for r in uniprot::uniprot::parse(f) {
   let entry = r.unwrap();
   // ... process the UniProt entry ...
}

UniRef

The uniprot::uniref::parse function can be used to obtain an iterator over the entries (uniprot::uniref::Entry) of a UniRef database in XML format (UniRef100, UniRef90, or UniRef50).

UniParc

The uniprot::uniparc::parse function can be used to obtain an iterator over the entries (uniprot::uniparc::Entry) of a UniParc database in XML format.

📦 Decoding Gzip

If parsing a Gzipped file, you can use flate2::read::GzDecoder or libflate::gzip::Decoder to decode the input stream, and then simply wrap it in a BufferedReader. Note that flate2 has slightly better performance, but binds to C,, while libflate is a pure Rust implementation.

📧 Downloading from FTP

Uniprot is available from the two following locations: ftp.ebi.ac.uk and ftp.uniprot.org, the former being located in Europe while the latter is in the United States. The ftp crate can be used to open a connection and parse the databases on-the-fly: see the uniprot::uniprot::parse example to see a code snippet.

📧 Downloading from HTTP

If FTP is not available, note that the EBI FTP server can also be reached using HTTP at http://ftp.ebi.ac.uk. This allows using HTTP libraries instead of FTP ones to reach the release files.

📝 Features

threading - enabled by default.

The threading feature compiles the parser module in multi-threaded mode. This feature greatly improves parsing speed and efficiency, but removes any guarantee about the order the entries are yielded in.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

📜 License

This library is provided under the open-source MIT license.

Modules

Ubiquitous types for error management.

XML parser implementation.

Data types for the UniParc database.

Data types for the UniProtKB databases.

Data types for the UniRef databases.