pub struct Alignment {Show 14 fields
pub name: String,
pub seq_len: usize,
pub seq_interval: Range<usize>,
pub path: TargetPath,
pub path_len: usize,
pub path_interval: Range<usize>,
pub matches: usize,
pub edits: usize,
pub mapq: Option<usize>,
pub score: Option<isize>,
pub base_quality: Vec<u8>,
pub difference: Vec<Difference>,
pub pair: Option<PairedRead>,
pub optional: Vec<TypedField>,
}Expand description
An alignment between a query sequence and a target path in a graph.
This object corresponds either to a line in a GAF file or to a row in table Alignments in crate::GAFBase.
When the alignment is built from a GAF line, the target path is stored explicitly.
For alignments stored in a database, only the GBWT starting position is stored.
See TargetPath for details.
A GAF line can be converted to an Alignment object with Alignment::from_gaf.
An Alignment object can be serialized as a GAF line with Alignment::to_gaf.
The conversion can be lossy (see crate::alignment for details).
§Examples
use gbz_base::Alignment;
use gbz::Orientation;
use gbz::support;
// Unnamed empty sequence.
let empty = Alignment::new();
assert!(empty.name.is_empty());
assert_eq!(empty.seq_len, 0);
assert!(empty.is_unaligned());
// Construct a GAF line.
let name = "query";
let seq_len = 7;
let seq_interval = 0..6;
let path = ">1<2>3";
let path_len = 8;
let path_interval = 2..8;
let matches = 5;
let edits = 1;
let mapq = 60;
let line = format!("{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}",
name,
seq_len, seq_interval.start, seq_interval.end,
'+', path,
path_len, path_interval.start, path_interval.end,
matches, matches + edits, mapq
);
// Create an alignment from the line.
let aln = Alignment::from_gaf(line.as_bytes()).unwrap();
assert_eq!(aln.name, name);
assert_eq!(aln.seq_len, seq_len);
assert_eq!(aln.seq_interval, seq_interval);
assert!(aln.has_target_path());
let path = vec!(
support::encode_node(1, Orientation::Forward),
support::encode_node(2, Orientation::Reverse),
support::encode_node(3, Orientation::Forward)
);
assert_eq!(aln.target_path(), Some(path.as_ref()));
assert_eq!(aln.path_len, path_len);
assert_eq!(aln.path_interval, path_interval);
assert_eq!(aln.matches, matches);
assert_eq!(aln.edits, edits);
assert_eq!(aln.mapq, Some(mapq));
// All optional fields are missing.
assert!(aln.score.is_none());
assert!(aln.base_quality.is_empty());
assert!(aln.difference.is_empty());
assert!(aln.pair.is_none());
assert!(aln.optional.is_empty());
// Convert back to GAF.
let _query_sequence = b"GATTACA"; // Not used yet.
let target_sequence = b"GAGATCAC".to_vec();
let from_aln = aln.to_gaf(&target_sequence);
assert_eq!(&from_aln, line.as_bytes());Fields§
§name: StringName of the query sequence.
seq_len: usizeLength of the query sequence.
seq_interval: Range<usize>Aligned interval of the query sequence.
path: TargetPathTarget path in the orientation of the query sequence.
path_len: usizeLength of the target path.
path_interval: Range<usize>Aligned interval of the target path.
matches: usizeNumber of matches in the alignment.
edits: usizeNumber of mismatches and gaps in the alignment.
mapq: Option<usize>Mapping quality.
score: Option<isize>Alignment score.
base_quality: Vec<u8>Base quality values for the query sequence.
difference: Vec<Difference>Difference string, or an empty vector if not present.
pair: Option<PairedRead>Information about the paired alignment.
optional: Vec<TypedField>Optional typed fields that have not been interpreted.
Implementations§
Source§impl Alignment
Construction; conversions between GAF lines and Alignment objects.
impl Alignment
Construction; conversions between GAF lines and Alignment objects.
Sourcepub fn from_gaf(line: &[u8]) -> Result<Self, String>
pub fn from_gaf(line: &[u8]) -> Result<Self, String>
Parses an alignment from a GAF line.
Returns an error if the line cannot be parsed. The line may end with up to one endline character, which is ignored. The returned alignment stores the target path explicitly. Parsing is based on bytes rather than characters to avoid unnecessary UTF-8 validation.
If a difference string is present, some numerical fields will be recalculated from it. These include interval ends on both the query and the target, as well as the number of matches and edits. This behavior is justified, because some aligners may not calculate these values correctly.
Sourcepub fn to_gaf(&self, target_sequence: &[u8]) -> Vec<u8> ⓘ
pub fn to_gaf(&self, target_sequence: &[u8]) -> Vec<u8> ⓘ
Converts the alignment to a GAF line.
If the target path is stored as a GBWT starting position, it will be missing (*).
The path can be set with Alignment::set_target_path or extracted from a GBWT index with Alignment::extract_target_path.
The returned line does not end with an endline character.
A target sequence is necessary for reconstructing the difference string, if present. It must correspond to the entire target path.
Source§impl Alignment
Operations on the Alignment object.
impl Alignment
Operations on the Alignment object.
Sourcepub fn is_unaligned(&self) -> bool
pub fn is_unaligned(&self) -> bool
Returns true if the query sequence is unaligned.
An aligned sequence has a non-empty query interval aligned to a non-empty target interval. NOTE: An empty sequence is by definition unaligned.
Sourcepub fn is_full(&self) -> bool
pub fn is_full(&self) -> bool
Returns true if this is an alignment of the entire query sequence.
NOTE: An empty sequence is by definition unaligned.
Sourcepub fn is_exact(&self) -> bool
pub fn is_exact(&self) -> bool
Returns true if this is an exact alignment.
NOTE: An empty sequence is by definition unaligned.
Sourcepub fn has_difference_string(&self) -> bool
pub fn has_difference_string(&self) -> bool
Returns true if there is a non-empty difference string for the alignment.
Sourcepub fn min_handle(&self) -> Option<usize>
pub fn min_handle(&self) -> Option<usize>
Returns the minimum GBWT node identifier in the target path, or None if there is no path.
Sourcepub fn max_handle(&self) -> Option<usize>
pub fn max_handle(&self) -> Option<usize>
Returns the maximum GBWT node identifier in the target path, or None if there is no path.
Sourcepub fn has_target_path(&self) -> bool
pub fn has_target_path(&self) -> bool
Returns true if the target path is stored explicitly.
Sourcepub fn has_non_empty_target_path(&self) -> bool
pub fn has_non_empty_target_path(&self) -> bool
Returns true if the target path is stored explicitly and is non-empty.
Sourcepub fn target_path(&self) -> Option<&[usize]>
pub fn target_path(&self) -> Option<&[usize]>
Returns the target path if it is stored explicitly.
Sourcepub fn extract_target_path(&mut self, index: &GBWT)
pub fn extract_target_path(&mut self, index: &GBWT)
Sets the target path from the GBWT index if it is not already present.
Sourcepub fn set_target_path(&mut self, path: Vec<usize>)
pub fn set_target_path(&mut self, path: Vec<usize>)
Sets the given path as the target path.
Sourcepub fn set_target_path_len(&mut self, len: usize)
pub fn set_target_path_len(&mut self, len: usize)
Sets the length of the target path.
Sourcepub fn iter<'a>(
&'a self,
sequence_len: Arc<dyn Fn(usize) -> Option<usize> + 'a>,
) -> Option<AlignmentIter<'a>>
pub fn iter<'a>( &'a self, sequence_len: Arc<dyn Fn(usize) -> Option<usize> + 'a>, ) -> Option<AlignmentIter<'a>>
Returns an iterator over the alignment as a sequence of mappings.
Returns None if a valid iterator cannot be built.
The iterator requires a difference string and an explicitly stored target path.
It may stop early if the alignment is invalid.
The iterator needs a function that provides the sequence length for each node.
This function may be based on gbz::GBZ, Subgraph, or crate::ReadSet.
Sourcepub fn clip<'a>(
&self,
subgraph: &Subgraph,
sequence_len: Arc<impl Fn(usize) -> Option<usize> + 'a>,
) -> Result<Vec<Alignment>, String>
pub fn clip<'a>( &self, subgraph: &Subgraph, sequence_len: Arc<impl Fn(usize) -> Option<usize> + 'a>, ) -> Result<Vec<Alignment>, String>
Clips the alignment into fragments that are fully contained in the given subgraph.
Returns an empty vector if the read is unaligned or lacks an explicit non-empty target path or a difference string. There will be one alignment fragment for every maximal subpath in the subgraph. The fragments have the same name, pair, optional fields, and statistics as the original alignment. Only the aligned intervals and difference strings depend on the fragment.
Fragments of the same alignment are identified by a fragment index stored as an optional field fi:i.
The indexes start from 1.
Fragment index is omitted if the entire alignment is in the subgraph.
Clipping requires a function that returns the sequence length for the node with the given handle.
This function may be based on gbz::GBZ, Subgraph, or crate::ReadSet.
§Errors
Returns an error if an AlignmentIter cannot be created.
May return an error if the alignment is invalid and the iterator returns non-consecutive mappings.
§Examples
use gbz::{GBZ, Orientation};
use gbz::support;
use gbz_base::{Alignment, Difference, Subgraph};
use gbz_base::utils;
use simple_sds::serialize;
use std::sync::Arc;
let gbz_filename = utils::get_test_data("micb-kir3dl1.gbz");
let graph: GBZ = serialize::load_from(&gbz_filename).unwrap();
// Create a perfect alignment with node lengths of 68 bp, 1 bp, and 6 bp.
let mut aln = Alignment::new();
aln.seq_len = 50;
aln.seq_interval = 0..50;
let target_path = vec![
support::encode_node(13, Orientation::Forward),
support::encode_node(14, Orientation::Forward),
support::encode_node(16, Orientation::Forward),
];
aln.set_target_path(target_path.clone());
aln.path_len = 75;
aln.path_interval = 20..70;
aln.matches = 50;
aln.difference.push(Difference::Match(50));
// Create a subgraph without the middle node.
let mut subgraph = Subgraph::new();
let _ = subgraph.add_node_from_gbz(&graph, 13);
let _ = subgraph.add_node_from_gbz(&graph, 16);
// Clip the alignment to the subgraph.
let sequence_len = Arc::new(|handle| {
let (node_id, _) = support::decode_node(handle);
graph.sequence_len(node_id)
});
let result = aln.clip(&subgraph, sequence_len.clone());
assert!(result.is_ok());
let clipped = result.unwrap();
// We should have two fragments.
assert_eq!(clipped.len(), 2);
assert_eq!(clipped[0].seq_interval, 0..48);
assert_eq!(clipped[0].target_path().unwrap(), &target_path[0..1]);
assert_eq!(clipped[0].path_interval, 20..68);
assert_eq!(clipped[0].difference, vec![Difference::Match(48)]);
assert_eq!(clipped[1].seq_interval, 49..50);
assert_eq!(clipped[1].target_path().unwrap(), &target_path[2..3]);
assert_eq!(clipped[1].path_interval, 0..1);
assert_eq!(clipped[1].difference, vec![Difference::Match(1)]);