Skip to main content

Module blast

Module blast 

Source
Expand description

async streaming BLAST parsers for the (outfmt 6 and outfmt 5), gzip-transparent reader, option of –json streaming output

Async streaming BLAST parsers

  • outfmt 6 (tabular): streamed, async line reader -> JSON NDJSON
  • outfmt 5 (XML): async streaming XML iterator returning Iteration structs
  • Gzip auto-detection by filename (“.gz”)
  • CLI with –json to stream newline-delimited JSON to stdout

§A Genbank to GFF parser

You are able to parse genbank and save as a GFF (gff3) format as well as extracting DNA sequences, gene DNA sequences (ffn) and protein fasta sequences (faa)

You can also create new records and save as a genbank (gbk) format

§Detailed Explanation

The Genbank parser contains:

BLAST records are provided by using any of these well-known programs to determine similarity BLAST+ from NCBI as BLASTP, BLASTX, BLASTN, TBLASTN or TBLASTX, Diamond BLAST (protein only) or MMSeqs2 Several different output formats can be specified, here we provide parsers for two common output formats: The BLAST+ outfmt 5 (an XML verbose format) and the outfmt 6 (the single line tabular format)

Example to save a provided BLAST outfmt 6 single line tabular format into json

    use std::io::Write;
    use tokio::io::BufReader as TokioBufReader;
    use std::io::Cursor;
    use microBioRust::blast::stream_outfmt6_to_json;
   
    async fn test_stream_tab_to_json() {
        let data = "q1  h1      99.0    10      0       0       1       10      1       10      1e-5    50";
        let cursor = Cursor::new(data.as_bytes());
        let reader = TokioBufReader::new(cursor);
        let res = stream_outfmt6_to_json(reader).await;
        println!("results are {:?}", &res);
    }

Example to create a completely new blast XML record

 use std::io::Write;
 use tokio::io::BufReader as TokioBufReader;
 use std::io::Cursor;
 use microBioRust::blast::AsyncBlastXmlIter;

 async fn test_async_xml_iter_simple() {
       let xml = r#"<?xml version="1.0"?>
<BlastOutput>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>My query</Iteration_query-def>
      <Iteration_query-len>100</Iteration_query-len>
      <Hit>
        <Hit_id>gi|1</Hit_id>
        <Hit_def>Some hit</Hit_def>
        <Hit_accession>ABC123</Hit_accession>
        <Hit_len>100</Hit_len>
        <Hsp>
          <Hsp_bit-score>50.0</Hsp_bit-score>
          <Hsp_evalue>1e-5</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>100</Hsp_query-to>
          <Hsp_hit-from>1</Hsp_hit-from>
          <Hsp_hit-to>100</Hsp_hit-to>
          <Hsp_identity>90</Hsp_identity>
          <Hsp_align-len>100</Hsp_align-len>
        </Hsp>
      </Hit>
    </Iteration>
  </BlastOutput_iterations>
</BlastOutput>"#;

        let cursor = Cursor::new(xml.as_bytes());
        let reader = TokioBufReader::new(cursor);
        let mut iter = AsyncBlastXmlIter::from_reader(reader);
        let next = iter.next_iteration().await;

        assert!(next.is_some());
        let it = next.unwrap().unwrap();
        assert_eq!(it.query_id.unwrap(), "Query_1");
        assert_eq!(it.query_def.unwrap(), "My query");
        assert_eq!(it.hits.len(), 1);
    }

Structs§

AsyncBlastXmlIter
BlastTabRecord
BlastXmlIteration
Hit
Hsp
Statistics

Functions§

infer_format
open_async_reader
stream_outfmt6_to_json