Expand description
Rust data structures and parser for the UniprotKB database(s).
🔌 Usage
All parse
functions takes a BufRead
implementor as the input.
Additionaly, if compiling with the threading
feature, it will
require the input to be Send
and 'static
as well. They will use
the uniprot::Parser
, which is either SequentialParser
or
ThreadedParser
depending on the compilation features.
🗄️ Databases
UniProt
The uniprot::uniprot::parse
function can be used to obtain an iterator
over the entries (uniprot::uniprot::Entry
) of a UniprotKB database in
XML format (either SwissProt or TrEMBL).
extern crate uniprot;
let f = std::fs::File::open("tests/uniprot.xml")
.map(std::io::BufReader::new)
.unwrap();
for r in uniprot::uniprot::parse(f) {
let entry = r.unwrap();
// ... process the UniProt entry ...
}
The XML format is compatible with the results returned by the UniProt API,
so you can also use the uniprot::uniprot::parse
to parse search results:
extern crate ureq;
extern crate libflate;
extern crate uniprot;
let query = "colicin";
let req = ureq::get("https://rest.uniprot.org/uniprotkb/search")
.set("Accept", "application/xml")
.query("query", &format!("reviewed:true AND {}", query))
.query("format", "xml")
.query("compress", "true");
let reader = libflate::gzip::Decoder::new(req.call().unwrap().into_reader()).unwrap();
for r in uniprot::uniprot::parse(std::io::BufReader::new(reader)) {
let entry = r.unwrap();
// ... process the Uniprot entry ...
}
UniRef
The uniprot::uniref::parse
function can be used to obtain an iterator
over the entries (uniprot::uniref::Entry
) of a UniRef database in XML
format (UniRef100, UniRef90, or UniRef50).
UniParc
The uniprot::uniparc::parse
function can be used to obtain an iterator
over the entries (uniprot::uniparc::Entry
) of a UniParc database in
XML format.
📦 Decoding Gzip
If parsing a Gzipped file, you can use flate2::read::GzDecoder
or
libflate::gzip::Decoder
to decode the input stream, and then simply
wrap it in a BufferedReader
. Note that flate2
has slightly better
performance, but binds to C,, while libflate
is a pure Rust
implementation.
📧 Downloading from FTP
Uniprot is available from the two following locations: ftp.ebi.ac.uk
and ftp.uniprot.org, the former being located in Europe while the
latter is in the United States. The ftp
crate can be used to open
a connection and parse the databases on-the-fly: see the
uniprot::uniprot::parse
example to see a code snippet.
📧 Downloading from HTTP
If FTP is not available, note that the EBI FTP server can also be reached using HTTP at http://ftp.ebi.ac.uk. This allows using HTTP libraries instead of FTP ones to reach the release files.
📝 Features
threading
- enabled by default.
The threading
feature compiles the parser module in multi-threaded mode.
This feature greatly improves parsing speed and efficiency, but removes
any guarantee about the order the entries are yielded in.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
📜 License
This library is provided under the open-source MIT license.
Modules
- Ubiquitous types for error management.
- XML parser implementation.
- Data types for the UniParc database.
- Data types for the UniProtKB databases.
- Data types for the UniRef databases.