Expand description
async streaming BLAST parsers for the (outfmt 6 and outfmt 5), gzip-transparent reader, option of –json streaming output
Async streaming BLAST parsers
- outfmt 6 (tabular): streamed, async line reader -> JSON NDJSON
- outfmt 5 (XML): async streaming XML iterator returning Iteration structs
- Gzip auto-detection by filename (“.gz”)
- CLI with –json to stream newline-delimited JSON to stdout
§A Genbank to GFF parser
You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa)
You can also create new records and save as a genbank (gbk) format
§Detailed Explanation
The Genbank parser contains:
BLAST records are provided by using any of these well-known programs to determine similarity BLAST+ from NCBI as BLASTP, BLASTX, BLASTN, TBLASTN or TBLASTX, Diamond BLAST (protein only) or MMSeqs2 Several different output formats can be specified, here we provide parsers for two common output formats: The BLAST+ outfmt 5 (an XML verbose format) and the outfmt 6 (the single line tabular format)
Example to save a provided BLAST outfmt 6 single line tabular format into json
use std::io::Write;
use tokio::io::BufReader as TokioBufReader;
use std::io::Cursor;
use microBioRust::blast::stream_outfmt6_to_json;
async fn test_stream_tab_to_json() {
let data = "q1 h1 99.0 10 0 0 1 10 1 10 1e-5 50";
let cursor = Cursor::new(data.as_bytes());
let reader = TokioBufReader::new(cursor);
let res = stream_outfmt6_to_json(reader).await;
println!("results are {:?}", &res);
}Example to create a completely new blast XML record
use std::io::Write;
use tokio::io::BufReader as TokioBufReader;
use std::io::Cursor;
use microBioRust::blast::AsyncBlastXmlIter;
async fn test_async_xml_iter_simple() {
let xml = r#"<?xml version="1.0"?>
<BlastOutput>
<BlastOutput_iterations>
<Iteration>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>My query</Iteration_query-def>
<Iteration_query-len>100</Iteration_query-len>
<Hit>
<Hit_id>gi|1</Hit_id>
<Hit_def>Some hit</Hit_def>
<Hit_accession>ABC123</Hit_accession>
<Hit_len>100</Hit_len>
<Hsp>
<Hsp_bit-score>50.0</Hsp_bit-score>
<Hsp_evalue>1e-5</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>100</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>100</Hsp_hit-to>
<Hsp_identity>90</Hsp_identity>
<Hsp_align-len>100</Hsp_align-len>
</Hsp>
</Hit>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>"#;
let cursor = Cursor::new(xml.as_bytes());
let reader = TokioBufReader::new(cursor);
let mut iter = AsyncBlastXmlIter::from_reader(reader);
let next = iter.next_iteration().await;
assert!(next.is_some());
let it = next.unwrap().unwrap();
assert_eq!(it.query_id.unwrap(), "Query_1");
assert_eq!(it.query_def.unwrap(), "My query");
assert_eq!(it.hits.len(), 1);
}