[][src]Module bio::alignment::pairwise::banded

Banded Smith-Waterman alignment for fast comparison of long strings. Use sparse dynamic programming to find a 'backbone' alignment from exact k-mer matches, then compute the SW alignment in a 'band' surrounding the backbone, with a configurable width w. This method is not guaranteed to recover the Smith-Waterman alignment, but will usually find the same alignment if a) there is a reasonable density of exact k-mer matches between the sequences, and b) the width parameter w is larger than the excursion of the alignment path from diagonal between successive kmer matches. This technique is employed in long-read aligners (e.g. BLASR and BWA) to drastically reduce runtime compared to Smith Waterman. Complexity roughly O(min(m,n) * w)

Example

use bio::alignment::pairwise::banded::*;
use bio::alignment::sparse::hash_kmers;
use bio::alignment::pairwise::{MIN_SCORE, Scoring};
use bio::alignment::AlignmentOperation::*;
use std::iter::repeat;

let x = b"AGCACACGTGTGCGCTATACAGTAAGTAGTAGTACACGTGTCACAGTTGTACTAGCATGAC";
let y = b"AGCACACGTGTGCGCTATACAGTACACGTGTCACAGTTGTACTAGCATGAC";
let score = |a: u8, b: u8| if a == b {1i32} else {-1i32};
let k = 8;  // kmer match length
let w = 6;  // Window size for creating the band
let mut aligner = Aligner::new(-5, -1, score, k, w);
let alignment = aligner.local(x, y);
// aligner.global(x, y), aligner.semiglobal(x, y) are also supported
assert_eq!(alignment.ystart, 0);
assert_eq!(alignment.xstart, 0);

// For cases where the reference is reused multiple times, we can invoke the
// pre-hashed version of the solver
let x = b"AGCACAAGTGTGCGCTATACAGGAAGTAGGAGTACACGTGTCA";
let y = b"CAGTTGTACTAGCATGACCAGTTGTACTAGCATGACAGCACACGTGTGCGCTATACAGTAAGTAGTAGTACACGTGTCA\
    CAGTTGTACTAGCATGACCAGTTGTACTAGCATGAC";
let y_kmers_hash = hash_kmers(y, k);
let alignment = aligner.semiglobal_with_prehash(x, y, &y_kmers_hash);
assert_eq!(alignment.score, 37);

// In addition to the standard modes (Global, Semiglobal and Local), a custom alignment
// mode is supported which supports a user-specified clipping penalty. Clipping is a
// special boundary condition where you are allowed to clip off the beginning/end of
// the sequence for a fixed penalty. See bio::alignment::pairwise for a more detailed
// explanation

// The following example considers a modification of the semiglobal mode where you are allowed
// to skip a prefix of the target sequence x, for a penalty of -10, but you have to consume
// the rest of the string in the alignment

let scoring = Scoring {
    gap_open: -5,
    gap_extend: -1,
    match_fn: |a: u8, b: u8| if a == b {1i32} else {-3i32},
    match_scores: Some((1, -3)),
    xclip_prefix: -10,
    xclip_suffix: MIN_SCORE,
    yclip_prefix: 0,
    yclip_suffix: 0
};
let x = b"GGGGGGACGTACGTACGTGTGCATCATCATGTGCGTATCATAGATAGATGTAGATGATCCACAGT";
let y = b"AAAAACGTACGTACGTGTGCATCATCATGTGCGTATCATAGATAGATGTAGATGATCCACAGTAAAA";
let mut aligner = Aligner::with_capacity_and_scoring(x.len(), y.len(), scoring, k, w);
let alignment = aligner.custom(x, y);
println!("{}", alignment.pretty(x,y));
assert_eq!(alignment.score, 49);
let mut correct_ops = Vec::new();
correct_ops.push(Yclip(4));
correct_ops.push(Xclip(6));
correct_ops.extend(repeat(Match).take(59));
correct_ops.push(Yclip(4));
assert_eq!(alignment.operations, correct_ops);

// aligner.custom_with_prehash(x, y, &y_kmers_hash) is also supported

Structs

Aligner

A banded implementation of Smith-Waterman aligner (SWA). Unlike the full SWA, this implementation computes the alignment between a pair of sequences only inside a 'band' withing the dynamic programming matrix. The band is constructed using the Sparse DP routine (see sparse::sdpkpp), which uses kmer matches to build the best common subsequence (including gap penalties) between the two strings. The band is constructed around this subsequence (using the window length 'w'), filling in the gaps.