1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
//! A vectorized library for FASTA/FASTQ parsing and bitpacking.
//!
//! # Requirements
//!
//! This library requires AVX2, SSE3, or NEON instruction sets. Enable `target-cpu=native` when
//! building:
//!
//! ```sh
//! RUSTFLAGS="-C target-cpu=native" cargo run --release
//! ```
//!
//! If your CPU has poor support for the
//! [PDEP instruction](https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#Parallel_bit_deposit_and_extract)
//! (e.g. AMD CPUs prior to 2020), use the `no-pdep` feature:
//!
//! ```sh
//! RUSTFLAGS="-C target-cpu=native" cargo run --release -F no-pdep
//! ```
//!
//! # Minimal example
//!
//! The main entry point is to define a configuration via [`ParserOptions`]
//! and build a [`FastxParser`] with this configuration.
//!
//! ```rust,no_run
//! use helicase::input::*;
//! use helicase::*;
//!
//! // set the options of the parser (at compile-time)
//! const CONFIG: Config = ParserOptions::default().config();
//!
//! fn main() {
//! let path = "...";
//!
//! // create a parser with the desired options
//! let mut parser = FastxParser::<CONFIG>::from_file(&path).expect("Cannot open file");
//!
//! // iterate over records
//! while let Some(_event) = parser.next() {
//! // get a reference to the header
//! let header = parser.get_header();
//!
//! // get a reference to the sequence (without newlines)
//! let seq = parser.get_dna_string();
//!
//! // ...
//! }
//! }
//! ```
//!
//! # Adjusting the configuration
//!
//! The parser is configured at compile-time via [`ParserOptions`].
//! For example, to ignore headers and split non-ACTG bases:
//!
//! ```rust
//! use helicase::*;
//!
//! const CONFIG: Config = ParserOptions::default()
//! .ignore_headers()
//! .split_non_actg()
//! .config();
//! ```
//!
//! # Bitpacked DNA formats
//!
//! The parser can output a bitpacked representation of the sequence in two formats:
//! - [`PackedDNA`](dna_format::PackedDNA) maps each base to two bits and packs them
//! (compatible with [packed-seq](https://github.com/rust-seq/packed-seq) via the `packed-seq` feature).
//! - [`ColumnarDNA`](dna_format::ColumnarDNA) separates the high bit and the low bit of each base into two bitmasks.
//!
//! Since each base is encoded in two bits, non-ACTG bases must be handled explicitly. Three
//! options are available via [`ParserOptions`]:
//! - [`split_non_actg`](ParserOptions::split_non_actg) splits the sequence at non-ACTG bases,
//! yielding one [`DnaChunk`](parser::Event::DnaChunk) event per contiguous ACTG run (default for bitpacked formats).
//! - [`skip_non_actg`](ParserOptions::skip_non_actg) skips non-ACTG bases and merges the remaining chunks,
//! yielding one [`Record`](parser::Event::Record) event per record.
//! - [`keep_non_actg`](ParserOptions::keep_non_actg) keeps non-ACTG bases and encodes them lossily,
//! yielding one [`Record`](parser::Event::Record) event per record (default for string format).
//!
//! # Events
//!
//! The parser is an iterator that yields [`Event`](parser::Event) values.
//! An event signals a record boundary or a contiguous DNA chunk,
//! but the data is always read from the parser itself via [`get_header`](HelicaseParser::get_header), [`get_dna_string`](HelicaseParser::get_dna_string), etc.
//!
//! There are two kinds of event:
//! - [`Event::Record`](parser::Event::Record) emitted once per record, after all of its DNA
//! chunks. Enabled by [`return_record`](ParserOptions::return_record) (on by default).
//! - [`Event::DnaChunk`](parser::Event::DnaChunk) emitted for each contiguous ACTG run.
//! Enabled by [`return_dna_chunk`](ParserOptions::return_dna_chunk) (on by default with
//! [`dna_packed`](ParserOptions::dna_packed) and [`dna_columnar`](ParserOptions::dna_columnar)).
//!
//! When both are active you need to match on the event to distinguish them:
//! ```rust,no_run
//! use helicase::input::*;
//! use helicase::parser::Event;
//! use helicase::*;
//!
//! // dna_packed enables DnaChunk events; and Record events are also kept by default.
//! const CONFIG: Config = ParserOptions::default().dna_packed().config();
//!
//! fn main() {
//! let path = "...";
//! let mut parser = FastxParser::<CONFIG>::from_file(&path).expect("Cannot open file");
//!
//! while let Some(event) = parser.next() {
//! match event {
//! Event::Record(_) => {
//! // all chunks of this record have been processed
//! }
//! Event::DnaChunk(_) => {
//! // one contiguous ACTG run is ready
//! let seq = parser.get_dna_packed();
//! }
//! }
//! }
//! }
//! ```
//!
//! When only one type of event is active, the event value can be safely ignored:
//! ```rust,no_run
//! use helicase::input::*;
//! use helicase::*;
//!
//! // Default config: only Record events, one per record.
//! const CONFIG: Config = ParserOptions::default().config();
//!
//! fn main() {
//! let path = "...";
//! let mut parser = FastxParser::<CONFIG>::from_file(&path).expect("Cannot open file");
//!
//! while let Some(_event) = parser.next() {
//! let header = parser.get_header();
//! let seq = parser.get_dna_string();
//! }
//! }
//! ```
//!
//! It is even possible to disable all events to process the entire file in one go, for instance if you simply want to count bases.
//!
//! # Iterating over chunks of packed DNA
//!
//! ```rust,no_run
//! use helicase::input::*;
//! use helicase::*;
//!
//! const CONFIG: Config = ParserOptions::default()
//! // by default, dna_packed splits non-ACTG bases and stops after each chunk
//! .dna_packed()
//! // don't stop the iterator at the end of a record
//! .return_record(false)
//! .config();
//!
//! fn main() {
//! let path = "...";
//!
//! let mut parser = FastxParser::<CONFIG>::from_file(&path).expect("Cannot open file");
//!
//! // iterate over each chunk of ACTG bases
//! while let Some(_event) = parser.next() {
//! // headers are still accessible between chunks
//! let header = parser.get_header();
//!
//! // get a reference to the packed sequence
//! let seq = parser.get_dna_packed();
//!
//! // or directly get a PackedSeq (requires the packed-seq feature)
//! // let packed_seq = parser.get_packed_seq();
//! }
//! }
//! ```
//!
//! # Crate features
//!
//! | Feature | Default | Description |
//! |--------------|---------|-------------|
//! | `packed-seq` | no | conversion to [packed-seq](https://github.com/rust-seq/packed-seq) types |
//! | `no-pdep` | no | disable PDEP instruction (recommended for AMD CPUs prior to 2020) |
//! | `gz` | yes | gzip decompression |
//! | `zstd` | yes | zstd decompression |
//! | `bz2` | no | bzip2 decompression |
//! | `xz` | no | xz decompression |
pub
pub
pub use ;
pub use ;
pub
pub
pub
pub