Crate seq_io [−] [src]
This library provides an(other) attempt at high performance FASTA and FASTQ parsing and writing.
The FASTA parser can read and write multi-line files. The FASTQ parser supports only single
lines. The sequence length of records in the FASTA/FASTQ files
is not limited by the size of the buffer. Instead, the buffer will grow until
the record fits, allowing parsers with a minimum amount of copying required.
How it grows can be configured (see BufStrategy
).
See also the documentation for the FASTA Reader and the FASTQ Reader. The methods for writing are documented here for FASTA and here for FASTQ.
Example FASTQ parser:
This code prints the ID string from each FASTQ record.
use seq_io::fastq::{Reader,Record}; let mut reader = Reader::from_path("seqs.fasta").unwrap(); while let Some(record) = reader.next() { let record = record.expect("Error reading record"); println!("{}", record.id().unwrap()); }
Example FASTA parser calculating mean sequence length:
The FASTA reader works just the same. One challenge with the FASTA
format is that the sequence can be broken into multiple lines.
Therefore, it is not always possible to get a slice to the whole sequence
without copying the data. But it is possible to use seq_lines()
for efficiently iterating over each sequence line:
use seq_io::fasta::{Reader,Record}; let mut reader = Reader::from_path("seqs.fasta").unwrap(); let mut n = 0; let mut sum = 0; while let Some(record) = reader.next() { let record = record.expect("Error reading record"); for s in record.seq_lines() { sum += s.len(); } n += 1; } println!("mean sequence length of {} records: {:.1} bp", n, sum as f32 / n as f32);
If the whole sequence is required at once, there is the
full_seq
,
which will only allocate the sequence if there are multiple lines.
use seq_io::fasta::{Reader,OwnedRecord};
Owned records
Both readers also provide iterators similar to Rust-Bio, which return owned data. This is slower, but make sense, e.g. if the records are collected in to a vector:
use seq_io::fasta::Reader; let mut reader = Reader::from_path("input.fasta").unwrap(); let records: Result<Vec<_>, _> = reader.records().collect();
Parallel processing
Functions for parallel processing can be found in the parallel
module
Modules
fasta |
Efficient FASTA reading and writing |
fastq |
Efficient FASTQ reading and writing |
parallel |
Experiments with parallel processing |
Macros
parallel_record_impl |
Structs
DoubleUntil |
Buffer size doubles until it reaches
|
DoubleUntil8M |
Buffer size doubles until it reaches 8 MB. Above, it will increase in steps of 8 MB |
Traits
BufStrategy |
Strategy that decides how a buffer should grow |