Skip to main content

AlignmentBlock

Struct AlignmentBlock 

Source
pub struct AlignmentBlock {
    pub min_handle: Option<usize>,
    pub max_handle: Option<usize>,
    pub alignments: usize,
    pub read_length: Option<usize>,
    pub gbwt_starts: Vec<u8>,
    pub names: Vec<u8>,
    pub quality_strings: Vec<u8>,
    pub difference_strings: Vec<u8>,
    pub flags: Flags,
    pub numbers: Vec<u8>,
    pub optional: Vec<u8>,
}
Expand description

An encoded block of Alignment objects.

This is a compressed representation of multiple sequences aligned to the same graph. With short reads, a properly chosen block size enables both column-based compression and random access to the alignments. Long reads do not benefit from column-based compression, as the amount of metadata is insignificant. Reasonable block sizes could be 1000 alignments for short reads and 10 for long reads.

For best results, node identifiers should approximate a topological order in the graph. The range of node identifiers in a path is typically proportional to the length of the path. If the alignments are sorted by (min id, max id), a block will then consist of alignments that are close to each other in the graph.

§Notes

  • Target paths must be stored separately (e.g. using a GBWT-based encoding).
  • A block must contain either aligned reads or unaligned reads, but not both.

§Examples

use gbz_base::{Alignment, AlignmentBlock};
use gbz_base::utils::PathStartSource;
use gbz_base::{formats, utils};
use gbz::GBWT;
use gbz::support;
use simple_sds::serialize;

// We assume that the target paths are stored separately in a GBWT index.
let gbwt_filename = utils::get_test_data("micb-kir3dl1_HG003.gbwt");
let index: GBWT = serialize::load_from(&gbwt_filename).unwrap();

// We use the GBWT index for determining the starting position for each path.
// But we could also compute it on the fly with `PathStartSource::new()`.
let mut source = PathStartSource::from(&index);

// Open a GAF file and skip the header.
let gaf_filename = utils::get_test_data("micb-kir3dl1_HG003.gaf");
let mut gaf_file = utils::open_file(&gaf_filename).unwrap();
while formats::peek_gaf_header_line(&mut gaf_file).unwrap() {
   let mut buf: Vec<u8> = Vec::new();
   let _ = gaf_file.read_until(b'\n', &mut buf).unwrap();
}

// Read some alignments from the GAF file.
let mut alignments = Vec::new();
for _ in 0..10 {
    let mut buf: Vec<u8> = Vec::new();
    let _ = gaf_file.read_until(b'\n', &mut buf).unwrap();
    let aln = Alignment::from_gaf(&buf).unwrap();
    // A block cannot have a mix of aligned and unaligned reads.
    assert!(!aln.is_unaligned());
    alignments.push(aln);
}

// Compress the block.
let mut first_id = 0;
let block = AlignmentBlock::new(&alignments, &mut source, first_id).unwrap();
assert_eq!(block.len(), alignments.len());
first_id += alignments.len(); // Next block would start there.

// Decompress the block and extract the paths from the GBWT.
let mut decompressed = block.decode().unwrap();
assert_eq!(decompressed.len(), alignments.len());
for (i, aln) in decompressed.iter_mut().enumerate() {
    aln.extract_target_path(&index);
    // NOTE: We need the reference graph to determine the true target path length.
    aln.path_len = alignments[i].path_len;
    assert_eq!(*aln, alignments[i]);
}

Fields§

§min_handle: Option<usize>

Minimum GBWT node identifier in the target paths, or None if this is a block of unaligned reads.

§max_handle: Option<usize>

Maximum GBWT node identifier in the target paths, or None if this is a block of unaligned reads.

§alignments: usize

Number of alignments in the block.

§read_length: Option<usize>

Expected read length in the block, or None if the lengths vary.

§gbwt_starts: Vec<u8>

GBWT starting positions for the target paths.

§names: Vec<u8>

Read and pair names.

§quality_strings: Vec<u8>

Quality strings.

§difference_strings: Vec<u8>

Difference strings.

§flags: Flags

Binary flags for each alignment.

§numbers: Vec<u8>

Encoded numerical information that cannot be derived from the other fields.

§optional: Vec<u8>

Optional typed fields that have not been interpreted.

Implementations§

Source§

impl AlignmentBlock

Source

pub const COMPRESSION_LEVEL: i32 = 7

Compression level for Zstandard.

Source

pub fn new( alignments: &[Alignment], source: &mut PathStartSource<'_>, first_id: usize, ) -> Result<Self, String>

Creates a new alignment block from the given read alignments and GBWT index.

If the reads are aligned, they correspond to paths first_id to first_id + alignments.len() - 1 in the GBWT index. The GBWT index may be bidirectional or unidirectional.

§Arguments
  • alignments: The alignments to include in the block.
  • source: A source for GBWT starting positions of the target paths.
  • first_id: Path identifier of the first alignment in the block.
§Errors

Returns an error, if:

  • The block contains a mix of aligned and unaligned reads.
  • The source computes GBWT starts on the fly, but an alignment does not store the target path explicitly.
  • Compression fails.
Source

pub fn len(&self) -> usize

Returns the number of alignments in the block.

Source

pub fn is_empty(&self) -> bool

Returns true if the block contains no alignments.

Source

pub fn decode(&self) -> Result<Vec<Alignment>, String>

Decompresses the block into a vector of alignments.

Aligned query sequences have target paths represented as GBWT starting positions. The path can be set with Alignment::set_target_path or extracted from a GBWT index with Alignment::extract_target_path. Unaligned query sequences have empty target paths.

The true length of the target path cannot be determined from the alignment block alone. It can be set later using Alignment::set_target_path_len.

§Errors

Returns an error if decompression fails or if data required for decoding the block is missing.

Trait Implementations§

Source§

impl Clone for AlignmentBlock

Source§

fn clone(&self) -> AlignmentBlock

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for AlignmentBlock

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.