chainfile/lib.rs
1//! `chainfile` is a crate for reading a processing genomic chain files.
2//!
3//! The crate provides two main points of entry:
4//!
5//! - Parsing and reading chain files directly.
6//! - Providing a machine for lifting over intervals given a chain file.
7//!
8//! Since the main purpose of a chain file is to lift over intervals from one
9//! genome build to another, we expect that most users will be interested in the
10//! latter functionality. However, we have exposed the former functionality in
11//! the event that it is needed for some other purpose.
12//!
13//! ## Parsing and reading chain files
14//!
15//! If you're interested in parsing and reading chain files directly, you can
16//! use the [`Reader`] facility to accomplish that. Most users will want to read
17//! the parsed [alignment sections](crate::alignment::section::Sections) using
18//! [`Reader::sections()`](crate::Reader::sections()). For each data section,
19//! you can access the [header](crate::alignment::section::header::Record) (via
20//! [`Section::header()`](crate::alignment::Section::header)) and the subsequent
21//! [data records](crate::alignment::section::data::Record) for that section
22//! (via [`Section::data()`](crate::alignment::Section::data)). However, most
23//! users will not be interested in working with the raw alignment data records.
24//!
25//! Generally, what one _actually_ wants is the mapping between contiguous
26//! regions of the reference and query genomes that are defined by the alignment
27//! data section. The translation between a raw alignment data records and this
28//! mapping can be tricky, especially considering gotchas such as coordinates on
29//! the reverse strand being stored as the reverse complement of the sequence.
30//! Instead of computing these yourself, you should use the
31//! [`liftover::StepThrough`] facility that can be obtained from each alignment
32//! data section via
33//! [`Section::stepthrough()`](crate::alignment::Section::stepthrough).
34//!
35//! Iterating over this stepthrough provides a series of
36//! [`ContiguousIntervalPair`](crate::liftover::stepthrough::interval_pair::ContiguousIntervalPair)s
37//! that represent contiguous alignments between the two genomes. This struct
38//! includes the ever-important
39//! [`ContiguousIntervalPair::liftover()`](crate::liftover::stepthrough::interval_pair::ContiguousIntervalPair::liftover)
40//! method to attempt to translate a
41//! [`omics::coordinate::interbase::Coordinate`] from the reference
42//! [`omics::coordinate::interval::interbase::Interval`] to the query
43//! [`omics::coordinate::interval::interbase::Interval`] if the coordinate falls
44//! within the [`Section`](crate::alignment::Section).
45//!
46//! Below is a representative example of how you might want to access and
47//! explore a chain file with the facilities discussed above.
48//!
49//! ```
50//! use chainfile as chain;
51//!
52//! let data = b"chain 0 seq0 4 + 0 4 seq0 5 - 0 5 1\n3\t0\t1\n1";
53//! let mut reader = chain::Reader::new(&data[..]);
54//!
55//! for result in reader.sections() {
56//! let section = result?;
57//! println!("{}", section.header());
58//!
59//! for result in section.stepthrough()? {
60//! let pair = result?;
61//! println!("{} -> {}", pair.reference(), pair.query());
62//! }
63//! }
64//!
65//! # Ok::<(), Box<dyn std::error::Error>>(())
66//! ```
67//!
68//! ## Liftover Machine
69//!
70//! Most often, users won't want to deal with the accounting that goes into
71//! iterating through [`Section`](crate::alignment::Section)s manually. To that
72//! end, this crate provides the [`liftover::Machine`] facility to ease the
73//! experience of lifting over intervals and coordinates.
74//!
75//! Foundationally, [`liftover::Machine`] provides the capability to attempt a
76//! lift over a [`omics::coordinate::interval::interbase::Interval`] from the
77//! reference genome to the query genome via [`liftover::Machine::liftover()`].
78//! Perhaps importantly (and different from most other liftover tools that the
79//! author is aware of), this method provides the complete list of mapped
80//! contiguous interval pairs that are encompassed by the provided interval
81//! rather than providing an inexact mapping and/or lifting over a single
82//! position.
83//!
84//! Note that, while [`liftover::Machine::liftover()`] accepts a
85//! [`omics::coordinate::interval::interbase::Interval`], one can always lift
86//! over a single position by constructing a 1-sized
87//! [`omics::coordinate::interval::interbase::Interval`] containing only the
88//! position in question.
89//!
90//! A [`liftover::Machine`] cannot be instantiated directly. Instead, you should
91//! use [`liftover::machine::Builder`] and the associated
92//! [`liftover::machine::Builder::try_build_from()`] method to construct a
93//! liftover machine.
94//!
95//! Below is a representative example of how one might read in a chain file,
96//! construct a liftover machine, parse an interval of interest, and then lift
97//! over that interval of interest from the reference genome to the query
98//! genome.
99//!
100//! ```
101//! use chainfile as chain;
102//! use omics::coordinate::interval::interbase::Interval;
103//!
104//! let data = b"chain 0 seq0 4 + 0 4 seq0 5 - 0 5 1\n3\t0\t1\n1";
105//! let mut reader = chain::Reader::new(&data[..]);
106//! let machine = chain::liftover::machine::Builder::default().try_build_from(reader)?;
107//!
108//! let interval = "seq0:+:3-4".parse::<Interval>()?;
109//! for chain_liftover in machine.liftover(interval).unwrap() {
110//! println!(
111//! "chain {} (score {})",
112//! chain_liftover.chain().id(),
113//! chain_liftover.chain().score()
114//! );
115//! for segment in chain_liftover.segments() {
116//! println!(" {} -> {}", segment.reference(), segment.query());
117//! }
118//! }
119//!
120//! # Ok::<(), Box<dyn std::error::Error>>(())
121//! ```
122
123#![warn(missing_docs)]
124#![warn(rust_2018_idioms)]
125#![warn(rust_2021_compatibility)]
126#![warn(missing_debug_implementations)]
127#![warn(clippy::missing_docs_in_private_items)]
128#![warn(rustdoc::broken_intra_doc_links)]
129
130pub mod alignment;
131pub mod liftover;
132pub mod line;
133pub mod reader;
134
135pub use line::Line;
136
137pub use self::reader::Reader;