Crate block_aligner
source ·Expand description
SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
Currently, SSE2, AVX2, Neon, and WASM SIMD are supported.
Example
use block_aligner::{cigar::*, scan_block::*, scores::*};
let min_block_size = 32;
let max_block_size = 256;
// A gap of length n will cost: open + extend * (n - 1)
let gaps = Gaps { open: -2, extend: -1 };
// Note that PaddedBytes, Block, and Cigar can be initialized with sequence length
// and block size upper bounds and be reused later for shorter sequences, to avoid
// repeated allocations.
let r = PaddedBytes::from_bytes::<NucMatrix>(b"TTAAAAAAATTTTTTTTTTTT", max_block_size);
let q = PaddedBytes::from_bytes::<NucMatrix>(b"TTTTTTTTAAAAAAATTTTTTTTT", max_block_size);
// Align with traceback, but no X-drop threshold (global alignment).
let mut a = Block::<true, false>::new(q.len(), r.len(), max_block_size);
a.align(&q, &r, &NW1, gaps, min_block_size..=max_block_size, 0);
let res = a.res();
assert_eq!(res, AlignResult { score: 7, query_idx: 24, reference_idx: 21 });
let mut cigar = Cigar::new(res.query_idx, res.reference_idx);
// Compute traceback and resolve =/X (matches/mismatches).
a.trace().cigar_eq(&q, &r, res.query_idx, res.reference_idx, &mut cigar);
assert_eq!(cigar.to_string(), "2=6I16=3D");
Tuning block sizes
For long, noisy Nanopore reads, a min block size of ~1% sequence length and a max block size
of ~10% sequence length performs well (tested with reads up to ~50kbps).
For proteins, a min block size of 32 and a max block size of 256 performs well.
Using a minimum block size that is at least 32 is recommended for most applications.
Using a maximum block size greater than 2^14 = 16384
is not recommended.
If the alignment scores are saturating (score too large), then use a smaller block size.
Let me know how block aligner performs on your data!
When building your code that uses this library, it is important to specify the
correct feature flags: simd_sse2
, simd_avx2
, simd_neon
, or simd_wasm
.
More information on specifying different features for different platforms
with the same dependency here.
Modules
- Data structures and functions for working with CIGAR strings.
- Main block aligner algorithm and supporting data structures.
- Structs for representing match/mismatch scoring matrices.
Constants
- Number of 16-bit lanes in a SIMD vector.
Functions
- Calculate the percentage of a length, rounded to the next power of two.