pub struct AlignmentBlock {
pub min_handle: Option<usize>,
pub max_handle: Option<usize>,
pub alignments: usize,
pub read_length: Option<usize>,
pub gbwt_starts: Vec<u8>,
pub names: Vec<u8>,
pub quality_strings: Vec<u8>,
pub difference_strings: Vec<u8>,
pub flags: Flags,
pub numbers: Vec<u8>,
pub optional: Vec<u8>,
}Expand description
An encoded block of Alignment objects.
This is a compressed representation of multiple sequences aligned to the same graph. With short reads, a properly chosen block size enables both column-based compression and random access to the alignments. Long reads do not benefit from column-based compression, as the amount of metadata is insignificant. Reasonable block sizes could be 1000 alignments for short reads and 10 for long reads.
For best results, node identifiers should approximate a topological order in the graph. The range of node identifiers in a path is typically proportional to the length of the path. If the alignments are sorted by (min id, max id), a block will then consist of alignments that are close to each other in the graph.
§Notes
- Target paths must be stored separately (e.g. using a GBWT-based encoding).
- A block must contain either aligned reads or unaligned reads, but not both.
§Examples
use gbz_base::{Alignment, AlignmentBlock};
use gbz_base::utils::PathStartSource;
use gbz_base::{formats, utils};
use gbz::GBWT;
use gbz::support;
use simple_sds::serialize;
// We assume that the target paths are stored separately in a GBWT index.
let gbwt_filename = utils::get_test_data("micb-kir3dl1_HG003.gbwt");
let index: GBWT = serialize::load_from(&gbwt_filename).unwrap();
// We use the GBWT index for determining the starting position for each path.
// But we could also compute it on the fly with `PathStartSource::new()`.
let mut source = PathStartSource::from(&index);
// Open a GAF file and skip the header.
let gaf_filename = utils::get_test_data("micb-kir3dl1_HG003.gaf");
let mut gaf_file = utils::open_file(&gaf_filename).unwrap();
while formats::peek_gaf_header_line(&mut gaf_file).unwrap() {
let mut buf: Vec<u8> = Vec::new();
let _ = gaf_file.read_until(b'\n', &mut buf).unwrap();
}
// Read some alignments from the GAF file.
let mut alignments = Vec::new();
for _ in 0..10 {
let mut buf: Vec<u8> = Vec::new();
let _ = gaf_file.read_until(b'\n', &mut buf).unwrap();
let aln = Alignment::from_gaf(&buf).unwrap();
// A block cannot have a mix of aligned and unaligned reads.
assert!(!aln.is_unaligned());
alignments.push(aln);
}
// Compress the block.
let mut first_id = 0;
let block = AlignmentBlock::new(&alignments, &mut source, first_id).unwrap();
assert_eq!(block.len(), alignments.len());
first_id += alignments.len(); // Next block would start there.
// Decompress the block and extract the paths from the GBWT.
let mut decompressed = block.decode().unwrap();
assert_eq!(decompressed.len(), alignments.len());
for (i, aln) in decompressed.iter_mut().enumerate() {
aln.extract_target_path(&index);
// NOTE: We need the reference graph to determine the true target path length.
aln.path_len = alignments[i].path_len;
assert_eq!(*aln, alignments[i]);
}Fields§
§min_handle: Option<usize>Minimum GBWT node identifier in the target paths, or None if this is a block of unaligned reads.
max_handle: Option<usize>Maximum GBWT node identifier in the target paths, or None if this is a block of unaligned reads.
alignments: usizeNumber of alignments in the block.
read_length: Option<usize>Expected read length in the block, or None if the lengths vary.
gbwt_starts: Vec<u8>GBWT starting positions for the target paths.
names: Vec<u8>Read and pair names.
quality_strings: Vec<u8>Quality strings.
difference_strings: Vec<u8>Difference strings.
flags: FlagsBinary flags for each alignment.
numbers: Vec<u8>Encoded numerical information that cannot be derived from the other fields.
optional: Vec<u8>Optional typed fields that have not been interpreted.
Implementations§
Source§impl AlignmentBlock
impl AlignmentBlock
Sourcepub const COMPRESSION_LEVEL: i32 = 7
pub const COMPRESSION_LEVEL: i32 = 7
Compression level for Zstandard.
Sourcepub fn new(
alignments: &[Alignment],
source: &mut PathStartSource<'_>,
first_id: usize,
) -> Result<Self, String>
pub fn new( alignments: &[Alignment], source: &mut PathStartSource<'_>, first_id: usize, ) -> Result<Self, String>
Creates a new alignment block from the given read alignments and GBWT index.
If the reads are aligned, they correspond to paths first_id to first_id + alignments.len() - 1 in the GBWT index.
The GBWT index may be bidirectional or unidirectional.
§Arguments
alignments: The alignments to include in the block.source: A source for GBWT starting positions of the target paths.first_id: Path identifier of the first alignment in the block.
§Errors
Returns an error, if:
- The block contains a mix of aligned and unaligned reads.
- The source computes GBWT starts on the fly, but an alignment does not store the target path explicitly.
- Compression fails.
Sourcepub fn decode(&self) -> Result<Vec<Alignment>, String>
pub fn decode(&self) -> Result<Vec<Alignment>, String>
Decompresses the block into a vector of alignments.
Aligned query sequences have target paths represented as GBWT starting positions.
The path can be set with Alignment::set_target_path or extracted from a GBWT index with Alignment::extract_target_path.
Unaligned query sequences have empty target paths.
The true length of the target path cannot be determined from the alignment block alone.
It can be set later using Alignment::set_target_path_len.
§Errors
Returns an error if decompression fails or if data required for decoding the block is missing.
Trait Implementations§
Source§impl Clone for AlignmentBlock
impl Clone for AlignmentBlock
Source§fn clone(&self) -> AlignmentBlock
fn clone(&self) -> AlignmentBlock
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more