1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
//! This library provides parsing and writing of FASTA, FASTQ and FASTX at a
//! high performance.
//!
//! For a detailed documentation of the components, please refer to the
//! module pages listed here:
//!
//! * **FASTA**: See [`fasta`](crate::fasta) module for an introduction.
//! * **FASTQ**: See [`fastq`](crate::fastq) module. A separate parser supporting
//!   multi-line FASTQ is found in [`fastq::multiline`](crate::fastq::multiline).
//! * **FASTX**: There are two approaches:
//!     - The parsers from the [`fastx`](crate::fastx) and
//!       [`fastx::multiline_qual`](crate::fastx::multiline_qual) module
//!     - An approach based on trait objects and dynamic dispatch found in the
//!       [`fastx::dynamic`](crate::fastx::dynamic) module.
//!
//! # Special features
//!
//! The following features are not covered in the module docs for the individual
//! formats:
//!
//! ## Parallel processing
//!
//! All readers allow reading sequence records into chunks called record sets.
//! Effectively, they are just copies of the internal buffer with associated
//! positional information. These record sets can be sent around in channels
//! without the overhead that sending single records would have. The idea for
//! this was borrowed from [fastq-rs](https://github.com/aseyboldt/fastq-rs).
//!
//! The [`parallel`](crate::parallel) module offers a few functions for
//! processing sequence records and record sets in a worker pool and then
//! sending them along with the processing results to the main thread.
//! The functions work; the API design may not be optimal yet.
//!
//! ## Position tracking and seeking
//!
//! All readers keep track of the byte offset, line number and record number
//! while parsing. The current position can be stored and used later for seeking
//! back to the same position. See [`here`](crate::fasta::Reader::seek) for
//! an example.
//!
//! It is not yet possible to restore a record completely given positional
//! information (such as from a `.fai` file). All that is done currently is
//! to set the position, so `next()` will return the correct record.
//!
//! # Design notes
//!
//! Apart from `R: io::Read`, all readers have two additional generic
//! parameters. It is normally not necessary to change the defaults, but in
//! some cases this may be relevant.
//!
//! ## Buffer growth policy
//!
//! The parsers avoid allocations and copying as much as possible.
//! To achieve this, each sequence record must fit into the underlying
//! buffer as a whole. This may not be possible if dealing with large sequences.
//! Therefore, the internal buffer of the reader will grow automatically to fit
//! the whole sequence record again. The buffer may grow until it reaches 1 GiB;
//! larger records will cause an error.
//!
//! The behaviour of buffer growth can be further configured by applying
//! a different policy. This is documented in the [`policy`](policy) module.
//!
//! ## Position stores
//!
//! At the core of the different parsers is the same code, which is called
//! with different parameters. While searching the buffer for sequence records,
//! the position of the different features is stored. This allows to later
//! access the header, sequence and quality features directly as slices taken
//! from the internal buffer. Which positional information needs to be stored
//! depends on the format. For example, the [`fasta`](crate::fasta) reader
//! stores the position of every sequence line in order to allow fast iteration
//! lines later. The [`fastq`](crate::fastq) reader needs to remember the
//! position of the quality scores, but doesn't need to store information about
//! multiple lines, which allows for a simpler data structure. In turn,
//! the [`fastx`](crate::fastx) reader needs to store FASTA lines *and* quality
//! scores.
//!
//! Therefore, all readers have a third generic parameter, which allows
//! assigning a specific "storage backend" implementing the
//! [`core::PositionStore`](crate::core::PositionStore) trait. Usually, it is
//! not necessary to deal with this parameter since each parser has a reasonable
//! default. The only case where it is changed in this crate is with the trait
//! object approach implemented in [`fastx::dynamic`](crate::fastx::dynamic).
//!
//! Note that not all combinations of readers and `PositionStore` types have
//! currently been tested, and some combinations are known to be problematic.
//! Others just don't make sense. For example, the API does not prohibit
//! combining `fastq::Reader` with  `fasta::LineStore`, but this will return
//! everything after the header as sequence, and no quality scores are stored.
//! TODO: document possible combinations

pub use self::error::*;
pub use self::helpers::*;
pub use self::record::*;

mod helpers;
mod record;
#[macro_use]
mod error;

#[macro_use]
pub mod core;
#[macro_use]
pub mod fastx;
pub mod fasta;
pub mod fastq;
pub mod parallel;
pub mod policy;
pub mod prelude;