Skip to main content

Alignment

Struct Alignment 

Source
pub struct Alignment {
Show 14 fields pub name: String, pub seq_len: usize, pub seq_interval: Range<usize>, pub path: TargetPath, pub path_len: usize, pub path_interval: Range<usize>, pub matches: usize, pub edits: usize, pub mapq: Option<usize>, pub score: Option<isize>, pub base_quality: Vec<u8>, pub difference: Vec<Difference>, pub pair: Option<PairedRead>, pub optional: Vec<TypedField>,
}
Expand description

An alignment between a query sequence and a target path in a graph.

This object corresponds either to a line in a GAF file or to a row in table Alignments in crate::GAFBase. When the alignment is built from a GAF line, the target path is stored explicitly. For alignments stored in a database, only the GBWT starting position is stored. See TargetPath for details.

A GAF line can be converted to an Alignment object with Alignment::from_gaf. An Alignment object can be serialized as a GAF line with Alignment::to_gaf. The conversion can be lossy (see crate::alignment for details).

§Examples

use gbz_base::Alignment;
use gbz::Orientation;
use gbz::support;

// Unnamed empty sequence.
let empty = Alignment::new();
assert!(empty.name.is_empty());
assert_eq!(empty.seq_len, 0);
assert!(empty.is_unaligned());

// Construct a GAF line.
let name = "query";
let seq_len = 7;
let seq_interval = 0..6;
let path = ">1<2>3";
let path_len = 8;
let path_interval = 2..8;
let matches = 5;
let edits = 1;
let mapq = 60;
let line = format!("{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}",
    name,
    seq_len, seq_interval.start, seq_interval.end,
    '+', path,
    path_len, path_interval.start, path_interval.end,
    matches, matches + edits, mapq
);

// Create an alignment from the line.
let aln = Alignment::from_gaf(line.as_bytes()).unwrap();
assert_eq!(aln.name, name);
assert_eq!(aln.seq_len, seq_len);
assert_eq!(aln.seq_interval, seq_interval);
assert!(aln.has_target_path());
let path = vec!(
    support::encode_node(1, Orientation::Forward),
    support::encode_node(2, Orientation::Reverse),
    support::encode_node(3, Orientation::Forward)
);
assert_eq!(aln.target_path(), Some(path.as_ref()));
assert_eq!(aln.path_len, path_len);
assert_eq!(aln.path_interval, path_interval);
assert_eq!(aln.matches, matches);
assert_eq!(aln.edits, edits);
assert_eq!(aln.mapq, Some(mapq));

// All optional fields are missing.
assert!(aln.score.is_none());
assert!(aln.base_quality.is_empty());
assert!(aln.difference.is_empty());
assert!(aln.pair.is_none());
assert!(aln.optional.is_empty());

// Convert back to GAF.
let _query_sequence = b"GATTACA"; // Not used yet.
let target_sequence = b"GAGATCAC".to_vec();
let from_aln = aln.to_gaf(&target_sequence);
assert_eq!(&from_aln, line.as_bytes());

Fields§

§name: String

Name of the query sequence.

§seq_len: usize

Length of the query sequence.

§seq_interval: Range<usize>

Aligned interval of the query sequence.

§path: TargetPath

Target path in the orientation of the query sequence.

§path_len: usize

Length of the target path.

§path_interval: Range<usize>

Aligned interval of the target path.

§matches: usize

Number of matches in the alignment.

§edits: usize

Number of mismatches and gaps in the alignment.

§mapq: Option<usize>

Mapping quality.

§score: Option<isize>

Alignment score.

§base_quality: Vec<u8>

Base quality values for the query sequence.

§difference: Vec<Difference>

Difference string, or an empty vector if not present.

§pair: Option<PairedRead>

Information about the paired alignment.

§optional: Vec<TypedField>

Optional typed fields that have not been interpreted.

Implementations§

Source§

impl Alignment

Construction; conversions between GAF lines and Alignment objects.

Source

pub fn new() -> Self

Creates an empty alignment.

Source

pub fn from_gaf(line: &[u8]) -> Result<Self, String>

Parses an alignment from a GAF line.

Returns an error if the line cannot be parsed. The line may end with up to one endline character, which is ignored. The returned alignment stores the target path explicitly. Parsing is based on bytes rather than characters to avoid unnecessary UTF-8 validation.

If a difference string is present, some numerical fields will be recalculated from it. These include interval ends on both the query and the target, as well as the number of matches and edits. This behavior is justified, because some aligners may not calculate these values correctly.

Source

pub fn to_gaf(&self, target_sequence: &[u8]) -> Vec<u8>

Converts the alignment to a GAF line.

If the target path is stored as a GBWT starting position, it will be missing (*). The path can be set with Alignment::set_target_path or extracted from a GBWT index with Alignment::extract_target_path. The returned line does not end with an endline character.

A target sequence is necessary for reconstructing the difference string, if present. It must correspond to the entire target path.

Source

pub fn validate(&self) -> Result<(), String>

Validates the internal consistency of the alignment.

Returns an error message if the alignment is inconsistent.

Source§

impl Alignment

Operations on the Alignment object.

Source

pub fn is_unaligned(&self) -> bool

Returns true if the query sequence is unaligned.

An aligned sequence has a non-empty query interval aligned to a non-empty target interval. NOTE: An empty sequence is by definition unaligned.

Source

pub fn is_full(&self) -> bool

Returns true if this is an alignment of the entire query sequence.

NOTE: An empty sequence is by definition unaligned.

Source

pub fn is_exact(&self) -> bool

Returns true if this is an exact alignment.

NOTE: An empty sequence is by definition unaligned.

Source

pub fn has_difference_string(&self) -> bool

Returns true if there is a non-empty difference string for the alignment.

Source

pub fn min_handle(&self) -> Option<usize>

Returns the minimum GBWT node identifier in the target path, or None if there is no path.

Source

pub fn max_handle(&self) -> Option<usize>

Returns the maximum GBWT node identifier in the target path, or None if there is no path.

Source

pub fn has_target_path(&self) -> bool

Returns true if the target path is stored explicitly.

Source

pub fn has_non_empty_target_path(&self) -> bool

Returns true if the target path is stored explicitly and is non-empty.

Source

pub fn target_path(&self) -> Option<&[usize]>

Returns the target path if it is stored explicitly.

Source

pub fn extract_target_path(&mut self, index: &GBWT)

Sets the target path from the GBWT index if it is not already present.

Source

pub fn set_target_path(&mut self, path: Vec<usize>)

Sets the given path as the target path.

Source

pub fn set_target_path_len(&mut self, len: usize)

Sets the length of the target path.

Source

pub fn iter<'a>( &'a self, sequence_len: Arc<dyn Fn(usize) -> Option<usize> + 'a>, ) -> Option<AlignmentIter<'a>>

Returns an iterator over the alignment as a sequence of mappings.

Returns None if a valid iterator cannot be built. The iterator requires a difference string and an explicitly stored target path. It may stop early if the alignment is invalid.

The iterator needs a function that provides the sequence length for each node. This function may be based on gbz::GBZ, Subgraph, or crate::ReadSet.

Source

pub fn clip<'a>( &self, subgraph: &Subgraph, sequence_len: Arc<impl Fn(usize) -> Option<usize> + 'a>, ) -> Result<Vec<Alignment>, String>

Clips the alignment into fragments that are fully contained in the given subgraph.

Returns an empty vector if the read is unaligned or lacks an explicit non-empty target path or a difference string. There will be one alignment fragment for every maximal subpath in the subgraph. The fragments have the same name, pair, optional fields, and statistics as the original alignment. Only the aligned intervals and difference strings depend on the fragment.

Fragments of the same alignment are identified by a fragment index stored as an optional field fi:i. The indexes start from 1. Fragment index is omitted if the entire alignment is in the subgraph.

Clipping requires a function that returns the sequence length for the node with the given handle. This function may be based on gbz::GBZ, Subgraph, or crate::ReadSet.

§Errors

Returns an error if an AlignmentIter cannot be created. May return an error if the alignment is invalid and the iterator returns non-consecutive mappings.

§Examples
use gbz::{GBZ, Orientation};
use gbz::support;
use gbz_base::{Alignment, Difference, Subgraph};
use gbz_base::utils;
use simple_sds::serialize;
use std::sync::Arc;

let gbz_filename = utils::get_test_data("micb-kir3dl1.gbz");
let graph: GBZ = serialize::load_from(&gbz_filename).unwrap();

// Create a perfect alignment with node lengths of 68 bp, 1 bp, and 6 bp.
let mut aln = Alignment::new();
aln.seq_len = 50;
aln.seq_interval = 0..50;
let target_path = vec![
    support::encode_node(13, Orientation::Forward),
    support::encode_node(14, Orientation::Forward),
    support::encode_node(16, Orientation::Forward),
];
aln.set_target_path(target_path.clone());
aln.path_len = 75;
aln.path_interval = 20..70;
aln.matches = 50;
aln.difference.push(Difference::Match(50));

// Create a subgraph without the middle node.
let mut subgraph = Subgraph::new();
let _ = subgraph.add_node_from_gbz(&graph, 13);
let _ = subgraph.add_node_from_gbz(&graph, 16);

// Clip the alignment to the subgraph.
let sequence_len = Arc::new(|handle| {
    let (node_id, _) = support::decode_node(handle);
    graph.sequence_len(node_id)
});
let result = aln.clip(&subgraph, sequence_len.clone());
assert!(result.is_ok());
let clipped = result.unwrap();

// We should have two fragments.
assert_eq!(clipped.len(), 2);
assert_eq!(clipped[0].seq_interval, 0..48);
assert_eq!(clipped[0].target_path().unwrap(), &target_path[0..1]);
assert_eq!(clipped[0].path_interval, 20..68);
assert_eq!(clipped[0].difference, vec![Difference::Match(48)]);
assert_eq!(clipped[1].seq_interval, 49..50);
assert_eq!(clipped[1].target_path().unwrap(), &target_path[2..3]);
assert_eq!(clipped[1].path_interval, 0..1);
assert_eq!(clipped[1].difference, vec![Difference::Match(1)]);

Trait Implementations§

Source§

impl Clone for Alignment

Source§

fn clone(&self) -> Alignment

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Alignment

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Alignment

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl PartialEq for Alignment

Source§

fn eq(&self, other: &Alignment) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for Alignment

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.