1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
//! Various types of parsing iterators for range formats.
//!
//! To work with genomic data in GRanges, one first needs to read it off disk and parse it. This is
//! done with *parsing iterators*. These iterators work on both plaintext and gzip-compressed
//! files (and bgzip and binary format support will hopefully be added). Each row of a file is
//! yielded as a particular parsed range type, which then can be filtered or altered in while
//! in the iterator, using Rust's powerful [`Iterator`] trait methods.
//!
//! Under the hood, this used the [`csv`] crate with [`serde`]. Initial parsers were handrolled,
//! but about 20%-30% less performant.
//!
//! ## Parsing Iterator Item Types
//!
//! These are the generic types that the iterator yields for each call of [`Iterator::next()`],
//! which represent a single row of data.
//!
//! There are two key item types that flow through parsing iterators, each representing a row of a
//! different kind of data file.
//!
//! 1. [`GenomicRangeEmptyRecord`], when a BED3 is loaded. This type indicates the incoming ranges
//! *do not* have associated data that would go into a data container. It is *empty* of data.
//!
//! 2. [`GenomicRangeRecord<Option<String>>`], which is a range record *possibly* (hence,
//! the [`Option`]) with remaining *unparsed* `String` data (e.g. the remaining unparsed
//! columns of a VCF file).
//!
//! 3. [`GenomicRangeRecord<U>`], which is a range record with an *addition* set of data.
//! For example, the standard BED5 has a two column [`Bed5Addition`], over the standard
//! first three range columns. Thus, the [`Bed5Iterator`] yields
//! [`GenomicRangeRecord<Bed5Addition>`] types.
//!
//! ## Why Empty Types?
//!
//! Most genomic file formats can be thought of as a genomic range with some associated data,
//! for example:
//!
//! - VCF files will have information on the alleles and a vector of genotypes.
//!
//! - GTF/GFF files store details about the annotated feature (e.g. name, protein ID, etc).
//!
//! However, often empty ranges (those that do not carry any data) are useful: for example, ranges
//! that just define genomic windows at regular widths, or masked repeat ranges. In this case, when
//! data is loaded into a [`GRanges<R, T>`] type, the data container type `T` is `()`. This is a
//! degenerate case that GRanges handles with a specific type called [`GRangesEmpty`], which indicates
//! the ranges in the ranges container in [`GRangesEmpty`] do not have any associated data.
//!
//! In most cases, people working with genomic file formats will know its type at compile-time.
//! However, if this isn't the case, see [⚙️ Handling Range Data when the type isn't know at runtime] below.
//!
//! # ⚙️ Handling Range Data when the type isn't know at runtime
//!
//! Because GRanges is a compile-time library, the type of incoming data is not runtime-polymorphic.
//! GRanges implements file detection methods to infer the filetype from the extension and
//! does additional type validation checks. When you want to use GRanges to handle a type
//! not known at compile-time, you will need to use `match` statements against the parsing
//! iterator types defined in [`GenomicRangesParser`]. The best examples of this are the
//! `granges` subcommands implementations.
//!
//! [`GRanges`]: crate::granges::GRanges
//! [`GRanges<R, T>`]: crate::granges::GRanges
//! [`GRangesEmpty`]: crate::granges::GRangesEmpty
//!
pub use ;
pub use ;
pub use ;