bio-steams
Types and datastructures for streaming genomics data
This crate is in early development. Contributions are very welcome.
Webassembly examples: Remove non M. TB reads from streaming fastqs, amplicon based SARS-CoV-2 assembly
Features
Shared Record type by Fastq and Fasta streams:
Records can be read into custom types: pub struct Fastq<R: BufRead, T = Seq<Dna>>
Examples
Stream a pair of fastqs and check some conditions on their name fields
// Open a pair of gzipped fastq files as streams of `Record`s with `Seq<Dna>` sequences
let fq1: = new;
let fq2: = new;
for zipped in fq1.zip
To run the fqcheck example program with read files r1.fq.gz and f2.fq.gz:
$ cargo build --example fqcheck --release
$ target/release/examples/fqcheck r1.fq.gz r2.fq.gz
Count amino acid k-mers
// this opens a gzipped data stream and parses it into `Records` with `Seq<Amino>` sequence fields
let faa: =
new;
// we can convert amino acid k-mers directly into usizes and use them to index into a table
let mut histogram = Boxnew;
for contig in faa
To run the aminokmers example program with fasta file proteins.faa:
$ cargo build --example fqcheck --release
$ target/release/examples/aminokmers proteins.faa
Roadmap
input streams:
- fastq
- fasta
- TODO sam/bam
- TODO gfa
todo:
- quality score trait,
Phredalias foru8 - futures::streams for async
- GAT lending iterator
- benchmark
- examples