Crate uniprot

source ·
Expand description

Star me

Rust data structures and parser for the UniprotKB database(s).

Actions Codecov License Source Crate Documentation Changelog GitHub issues

🔌 Usage

All parse functions takes a BufRead implementor as the input. Additionaly, if compiling with the threading feature, it will require the input to be Send and 'static as well. They will use the uniprot::Parser, which is either SequentialParser or ThreadedParser depending on the compilation features.

🗄️ Databases

UniProt

The uniprot::uniprot::parse function can be used to obtain an iterator over the entries (uniprot::uniprot::Entry) of a UniprotKB database in XML format (either SwissProt or TrEMBL).

extern crate uniprot;

let f = std::fs::File::open("tests/uniprot.xml")
   .map(std::io::BufReader::new)
   .unwrap();

for r in uniprot::uniprot::parse(f) {
   let entry = r.unwrap();
   // ... process the UniProt entry ...
}

The XML format is compatible with the results returned by the UniProt API, so you can also use the uniprot::uniprot::parse to parse search results:

extern crate ureq;
extern crate libflate;
extern crate uniprot;

let query = "colicin";
let req = ureq::get("https://rest.uniprot.org/uniprotkb/search")
    .set("Accept", "application/xml")
    .query("query", &format!("reviewed:true AND {}", query))
    .query("format", "xml")
    .query("compress", "true");
let reader = libflate::gzip::Decoder::new(req.call().unwrap().into_reader()).unwrap();

for r in uniprot::uniprot::parse(std::io::BufReader::new(reader)) {
    let entry = r.unwrap();
    // ... process the Uniprot entry ...
}

UniRef

The uniprot::uniref::parse function can be used to obtain an iterator over the entries (uniprot::uniref::Entry) of a UniRef database in XML format (UniRef100, UniRef90, or UniRef50).

UniParc

The uniprot::uniparc::parse function can be used to obtain an iterator over the entries (uniprot::uniparc::Entry) of a UniParc database in XML format.

📦 Decoding Gzip

If parsing a Gzipped file, you can use flate2::read::GzDecoder or libflate::gzip::Decoder to decode the input stream, and then simply wrap it in a BufferedReader. Note that flate2 has slightly better performance, but binds to C,, while libflate is a pure Rust implementation.

📧 Downloading from FTP

Uniprot is available from the two following locations: ftp.ebi.ac.uk and ftp.uniprot.org, the former being located in Europe while the latter is in the United States. The ftp crate can be used to open a connection and parse the databases on-the-fly: see the uniprot::uniprot::parse example to see a code snippet.

📧 Downloading from HTTP

If FTP is not available, note that the EBI FTP server can also be reached using HTTP at http://ftp.ebi.ac.uk. This allows using HTTP libraries instead of FTP ones to reach the release files.

📝 Features

threading - enabled by default.

The threading feature compiles the parser module in multi-threaded mode. This feature greatly improves parsing speed and efficiency, but removes any guarantee about the order the entries are yielded in.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

📜 License

This library is provided under the open-source MIT license.

Modules

  • Ubiquitous types for error management.
  • XML parser implementation.
  • Data types for the UniParc database.
  • Data types for the UniProtKB databases.
  • Data types for the UniRef databases.