Crate chainfile

source ·
Expand description

chainfile is a crate for reading a processing genomic chain files.

The crate provides two main points of entry:

  • Parsing and reading chain files directly.
  • Providing a machine for lifting over intervals given a chain file.

Since the main purpose of a chain file is to lift over intervals from one genome build to another, we expect that most users will be interested in the latter functionality. However, we have exposed the former functionality in the event that it is needed for some other purpose.

Parsing and reading chain files

If you’re interested in parsing and reading chain files directly, you can use the reader::Reader facility to accomplish that. Most users will want to read the parsed reader::AlignmentDataSections using reader::Reader::sections(). For each data section, you can access the record::HeaderRecord (via reader::AlignmentDataSection::header()) and the subsequent record::AlignmentDataRecords for that section (via reader::AlignmentDataSection::alignment_data_records()). However, most users will not be interested in working with the raw alignment data records.

Generally, what one actually wants is the mapping between contiguous regions of the reference and query genomes that are defined by the alignment data section. The translation between a raw alignment data records and this mapping can be tricky, especially considering gotchas such as coordinates on the reverse strand being stored as the reverse complement of the sequence. Instead of computing these yourself, you should use the liftover::StepThrough facility that can be obtained from each alignment data section via reader::AlignmentDataSection::stepthrough().

Iterating over this stepthrough provides a series of liftover::stepthrough::interval_pair::ContiguousIntervalPairs that represent contiguous mappings between the two genomes. This struct includes the ever-important liftover::stepthrough::interval_pair::ContiguousIntervalPair::liftover() method to translate a core::Coordinate from the reference core::Interval to the query core::Interval.

Below is a representative example of how you might want to access and explore a chain file with the facilities discussed above.

use chainfile as chain;

let data = b"chain 0 seq0 4 + 0 4 seq0 5 - 0 5 1\n3\t0\t1\n1";
let mut reader = chain::Reader::new(&data[..]);

for result in reader.sections() {
    let section = result?;
    println!("{}", section.header());

    for result in section.stepthrough()? {
        let pair = result?;
        println!("{} -> {}", pair.reference(), pair.query());
    }
}

Liftover Machine

Most often, users will be interested in the liftover::Machine facility. This struct provides the capability to attempt a lift over of a core::Interval from the reference genome to the query genome via liftover::Machine::liftover(). Perhaps importantly (and different from most other liftover tools that the author is aware of), this method provides the complete list of mapped contiguous interval pairs that are encompassed by the provided interval rather than providing an inexact mapping and/or lifting over a single position. Note that, if you’re interested in lifting over a single position, you can still construct a 1-sized core::Interval that contains your position and lift that over instead.

A liftover::Machine cannot be instantiated directly. Instead, you should use liftover::machine::Builder and the associated liftover::machine::Builder::try_build_from() method to construct a liftover machine.

Below is a representative example of how one might read in a chain file, construct a liftover machine, parse an interval of interest, and then lift over that interval of interest from the reference genome to the query genome.

use chainfile as chain;

let data = b"chain 0 seq0 4 + 0 4 seq0 5 - 0 5 1\n3\t0\t1\n1";
let mut reader = chain::Reader::new(&data[..]);
let machine = chain::liftover::machine::Builder::default().try_build_from(reader)?;

let interval = "seq0:3-4".parse::<chain::core::Interval>()?;
for result in machine.liftover(&interval).unwrap() {
    println!("{} -> {}", result.reference(), result.query());
}

Re-exports

Modules

  • Core functionality used across the crate.
  • Utilities for lifting over from a reference genome to a query genome.
  • A line within a chain file.
  • A chain file reader.
  • A chain file record.