Crate seq_io [] [src]

This library provides an(other) attempt at high performance FASTA and FASTQ parsing and writing. The FASTA parser can read and write multi-line files. The FASTQ parser supports only single lines. The sequence length of records in the FASTA/FASTQ files is not limited by the size of the buffer. Instead, the buffer will grow until the record fits, allowing parsers with a minimum amount of copying required. How it grows can be configured (see BufStrategy).

See also the documentation for the FASTA Reader and the FASTQ Reader. The methods for writing are documented here for FASTA and here for FASTQ.

Example FASTQ parser:

This code prints the ID string from each FASTQ record.

use seq_io::fastq::{Reader,Record};

let mut reader = Reader::from_path("seqs.fasta").unwrap();

while let Some(record) = reader.next() {
    let record = record.expect("Error reading record");
    println!("{}", record.id().unwrap());
}

Example FASTA parser calculating mean sequence length:

The FASTA reader works just the same. One challenge with the FASTA format is that the sequence can be broken into multiple lines. Therefore, it is not always possible to get a slice to the whole sequence without copying the data. But it is possible to use seq_lines() for efficiently iterating over each sequence line:

use seq_io::fasta::{Reader,Record};

let mut reader = Reader::from_path("seqs.fasta").unwrap();

let mut n = 0;
let mut sum = 0;
while let Some(record) = reader.next() {
    let record = record.expect("Error reading record");
    for s in record.seq_lines() {
        sum += s.len();
    }
    n += 1;
}
println!("mean sequence length of {} records: {:.1} bp", n, sum as f32 / n as f32);

If the whole sequence is required at once, there is the full_seq, which will only allocate the sequence if there are multiple lines. use seq_io::fasta::{Reader,OwnedRecord};

Owned records

Both readers also provide iterators similar to Rust-Bio, which return owned data. This is slower, but make sense, e.g. if the records are collected in to a vector:

use seq_io::fasta::Reader;

let mut reader = Reader::from_path("input.fasta").unwrap();

let records: Result<Vec<_>, _> = reader.records().collect();

Parallel processing

Functions for parallel processing can be found in the parallel module

Modules

fasta

Efficient FASTA reading and writing

fastq

Efficient FASTQ reading and writing

parallel

Experiments with parallel processing

Macros

parallel_record_impl

Structs

DoubleUntil

Buffer size doubles until it reaches double_size_limit (in bytes). Above, it increases in steps of double_size_limit

DoubleUntil8M

Buffer size doubles until it reaches 8 MB. Above, it will increase in steps of 8 MB

Traits

BufStrategy

Strategy that decides how a buffer should grow