Expand description
Rust data structures and parser for the UniprotKB database(s).
ยง๐ Usage
All parse
functions takes a BufRead
implementor as the input.
Additionaly, if compiling with the threading
feature, it will
require the input to be Send
and 'static
as well. They will use
the uniprot::Parser
, which is either SequentialParser
or
ThreadedParser
depending on the compilation features.
ยง๐๏ธ Databases
ยงUniProt
The uniprot::uniprot::parse
function can be used to obtain an iterator
over the entries (uniprot::uniprot::Entry
) of a UniprotKB database in
XML format (either SwissProt or TrEMBL).
extern crate uniprot;
let f = std::fs::File::open("tests/uniprot.xml")
.map(std::io::BufReader::new)
.unwrap();
for r in uniprot::uniprot::parse(f) {
let entry = r.unwrap();
// ... process the UniProt entry ...
}
The XML format is compatible with the results returned by the UniProt API,
so you can also use the uniprot::uniprot::parse
to parse search results:
extern crate ureq;
extern crate libflate;
extern crate uniprot;
let query = "colicin";
let req = ureq::get("https://rest.uniprot.org/uniprotkb/search")
.set("Accept", "application/xml")
.query("query", &format!("reviewed:true AND {}", query))
.query("format", "xml")
.query("compress", "true");
let reader = libflate::gzip::Decoder::new(req.call().unwrap().into_reader()).unwrap();
for r in uniprot::uniprot::parse(std::io::BufReader::new(reader)) {
let entry = r.unwrap();
// ... process the Uniprot entry ...
}
ยงUniRef
The uniprot::uniref::parse
function can be used to obtain an iterator
over the entries (uniprot::uniref::Entry
) of a UniRef database in XML
format (UniRef100, UniRef90, or UniRef50).
ยงUniParc
The uniprot::uniparc::parse
function can be used to obtain an iterator
over the entries (uniprot::uniparc::Entry
) of a UniParc database in
XML format.
ยง๐ฆ Decoding Gzip
If parsing a Gzipped file, you can use flate2::read::GzDecoder
or
libflate::gzip::Decoder
to decode the input stream, and then simply
wrap it in a BufferedReader
. Note that flate2
has slightly better
performance, but binds to C,, while libflate
is a pure Rust
implementation.
ยง๐ง Downloading from FTP
Uniprot is available from the two following locations: ftp.ebi.ac.uk
and ftp.uniprot.org, the former being located in Europe while the
latter is in the United States. The ftp
crate can be used to open
a connection and parse the databases on-the-fly: see the
uniprot::uniprot::parse
example to see a code snippet.
ยง๐ง Downloading from HTTP
If FTP is not available, note that the EBI FTP server can also be reached using HTTP at http://ftp.ebi.ac.uk. This allows using HTTP libraries instead of FTP ones to reach the release files.
ยง๐ Features
ยงthreading
- enabled by default.
The threading
feature compiles the parser module in multi-threaded mode.
This feature greatly improves parsing speed and efficiency, but removes
any guarantee about the order the entries are yielded in.
ยง๐ Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
ยง๐ License
This library is provided under the open-source MIT license.