Crate uniprot

Source
Expand description

Star me

Rust data structures and parser for the UniprotKB database(s).

Actions Codecov License Source Crate Documentation Changelog GitHub issues

ยง๐Ÿ”Œ Usage

All parse functions takes a BufRead implementor as the input. Additionaly, if compiling with the threading feature, it will require the input to be Send and 'static as well. They will use the uniprot::Parser, which is either SequentialParser or ThreadedParser depending on the compilation features.

ยง๐Ÿ—„๏ธ Databases

ยงUniProt

The uniprot::uniprot::parse function can be used to obtain an iterator over the entries (uniprot::uniprot::Entry) of a UniprotKB database in XML format (either SwissProt or TrEMBL).

extern crate uniprot;

let f = std::fs::File::open("tests/uniprot.xml")
   .map(std::io::BufReader::new)
   .unwrap();

for r in uniprot::uniprot::parse(f) {
   let entry = r.unwrap();
   // ... process the UniProt entry ...
}

The XML format is compatible with the results returned by the UniProt API, so you can also use the uniprot::uniprot::parse to parse search results:

extern crate ureq;
extern crate libflate;
extern crate uniprot;

let query = "colicin";
let req = ureq::get("https://rest.uniprot.org/uniprotkb/search")
    .set("Accept", "application/xml")
    .query("query", &format!("reviewed:true AND {}", query))
    .query("format", "xml")
    .query("compress", "true");
let reader = libflate::gzip::Decoder::new(req.call().unwrap().into_reader()).unwrap();

for r in uniprot::uniprot::parse(std::io::BufReader::new(reader)) {
    let entry = r.unwrap();
    // ... process the Uniprot entry ...
}

ยงUniRef

The uniprot::uniref::parse function can be used to obtain an iterator over the entries (uniprot::uniref::Entry) of a UniRef database in XML format (UniRef100, UniRef90, or UniRef50).

ยงUniParc

The uniprot::uniparc::parse function can be used to obtain an iterator over the entries (uniprot::uniparc::Entry) of a UniParc database in XML format.

ยง๐Ÿ“ฆ Decoding Gzip

If parsing a Gzipped file, you can use flate2::read::GzDecoder or libflate::gzip::Decoder to decode the input stream, and then simply wrap it in a BufferedReader. Note that flate2 has slightly better performance, but binds to C,, while libflate is a pure Rust implementation.

ยง๐Ÿ“ง Downloading from FTP

Uniprot is available from the two following locations: ftp.ebi.ac.uk and ftp.uniprot.org, the former being located in Europe while the latter is in the United States. The ftp crate can be used to open a connection and parse the databases on-the-fly: see the uniprot::uniprot::parse example to see a code snippet.

ยง๐Ÿ“ง Downloading from HTTP

If FTP is not available, note that the EBI FTP server can also be reached using HTTP at http://ftp.ebi.ac.uk. This allows using HTTP libraries instead of FTP ones to reach the release files.

ยง๐Ÿ“ Features

ยงthreading - enabled by default.

The threading feature compiles the parser module in multi-threaded mode. This feature greatly improves parsing speed and efficiency, but removes any guarantee about the order the entries are yielded in.

ยง๐Ÿ“‹ Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

ยง๐Ÿ“œ License

This library is provided under the open-source MIT license.

Modulesยง

error
Ubiquitous types for error management.
parser
XML parser implementation.
uniparc
Data types for the UniParc database.
uniprot
Data types for the UniProtKB databases.
uniref
Data types for the UniRef databases.