[][src]Module bio::alignment::pairwise

Calculate alignments with a generalized variant of the Smith Waterman algorithm. Complexity: O(n * m) for strings of length m and n.

For quick computation of alignments and alignment scores there are 6 simple functions.

Example

use bio::alignment::pairwise::*;
use bio::alignment::AlignmentOperation::*;

let x = b"ACCGTGGAT";
let y = b"AAAAACCGTTGAT";
let score = |a: u8, b: u8| if a == b {1i32} else {-1i32};
let mut aligner = Aligner::with_capacity(x.len(), y.len(), -5, -1, &score);
let alignment = aligner.semiglobal(x, y);
// x is global (target sequence) and y is local (reference sequence)
assert_eq!(alignment.ystart, 4);
assert_eq!(alignment.xstart, 0);
assert_eq!(alignment.operations,
    [Match, Match, Match, Match, Match, Subst, Match, Match, Match]);

// If you don't known sizes of future sequences, you could
// use Aligner::new().
// Global alignment:
let mut aligner = Aligner::new(-5, -1, &score);
let x = b"ACCGTGGAT";
let y = b"AAAAACCGTTGAT";
let alignment = aligner.global(x, y);
assert_eq!(alignment.ystart, 0);
assert_eq!(alignment.xstart, 0);
assert_eq!(aligner.local(x, y).score, 7);

// In addition to the standard modes (Global, Semiglobal and Local), a custom alignment
// mode is supported which supports a user-specified clipping penalty. Clipping is a
// special boundary condition where you are allowed to clip off the beginning/end of
// the sequence for a fixed penalty. As a starting example, we can use the custom mode
// for achieving the three standard modes as follows.

// scoring for semiglobal mode
let scoring = Scoring::new( -5, -1, &score) // Gap open, gap extend and match score function
   .xclip(MIN_SCORE) // Clipping penalty for x set to 'negative infinity', hence global in x
   .yclip(0); // Clipping penalty for y set to 0, hence local in y
let mut aligner = Aligner::with_scoring(scoring);
let alignment = aligner.custom(x,y); // The custom aligner invocation
assert_eq!(alignment.ystart, 4);
assert_eq!(alignment.xstart, 0);
// Note that in the custom mode, the clips are explicitly mentioned in the operations
assert_eq!(alignment.operations,
    [Yclip(4), Match, Match, Match, Match, Match, Subst, Match, Match, Match]);

// scoring for global mode
// scoring can also be created usinf from_scores if the match and mismatch scores are constants
let scoring = Scoring::from_scores( -5, -1, 1, -1) // Gap open, extend, match, mismatch score
    .xclip(MIN_SCORE)  // Clipping penalty for x set to 'negative infinity', hence global in x
    .yclip(MIN_SCORE); // Clipping penalty for y set to 'negative infinity', hence global in y
let mut aligner = Aligner::with_scoring(scoring);
let alignment = aligner.custom(x,y); // The custom aligner invocation
assert_eq!(alignment.ystart, 0);
assert_eq!(alignment.xstart, 0);
// Note that in the custom mode, the clips are explicitly mentioned in the operations
assert_eq!(alignment.operations,
    [Del, Del, Del, Del, Match, Match, Match, Match, Match, Subst, Match, Match, Match]);

// Similarly if the clip penalties are both set to 0, we have local alignment mode. The scoring
// struct also lets users set different penalties for prefix/suffix clipping, thereby letting
// users have the flexibility to create a wide variety of boundary conditions. The xclip() and
// yclip() methods sets the prefix and suffix penalties to be equal. The scoring struct can be
// explicitly constructed for full flexibility.

// The following example considers a modification of the semiglobal mode where you are allowed
// to skip a prefix of the target sequence x, for a penalty of -10, but you have to consume
// the rest of the string in the alignment

let scoring = Scoring {
    gap_open: -5,
    gap_extend: -1,
    match_fn: |a: u8, b: u8| if a == b {1i32} else {-3i32},
    match_scores: Some((1, -3)),
    xclip_prefix: -10,
    xclip_suffix: MIN_SCORE,
    yclip_prefix: 0,
    yclip_suffix: 0
};
let x = b"GGGGGGACGTACGTACGT";
let y = b"AAAAACGTACGTACGTAAAA";
let mut aligner = Aligner::with_capacity_and_scoring(x.len(), y.len(), scoring);
let alignment = aligner.custom(x, y);
println!("{}", alignment.pretty(x,y));
assert_eq!(alignment.score, 2);
assert_eq!(alignment.operations, [Yclip(4), Xclip(6), Match, Match, Match, Match,
   Match, Match, Match, Match, Match, Match, Match, Match, Yclip(4)]);

Modules

banded

Banded Smith-Waterman alignment for fast comparison of long strings. Use sparse dynamic programming to find a 'backbone' alignment from exact k-mer matches, then compute the SW alignment in a 'band' surrounding the backbone, with a configurable width w. This method is not guaranteed to recover the Smith-Waterman alignment, but will usually find the same alignment if a) there is a reasonable density of exact k-mer matches between the sequences, and b) the width parameter w is larger than the excursion of the alignment path from diagonal between successive kmer matches. This technique is employed in long-read aligners (e.g. BLASR and BWA) to drastically reduce runtime compared to Smith Waterman. Complexity roughly O(min(m,n) * w)

Structs

Aligner

A generalized Smith-Waterman aligner.

MatchParams

A concrete data structure which implements trait MatchFunc with constant match and mismatch scores

Scoring

Details of scoring are encapsulated in this structure. An affine gap score model is used so that the gap score for a length 'k' is: GapScore(k) = gap_open + gap_extend * k

TracebackCell

Packed representation of one cell of a Smith-Waterman traceback matrix. Stores the I, D and S traceback matrix values in two bytes. Possible traceback moves include : start, insert, delete, match, substitute, prefix clip and suffix clip for x & y. So we need 4 bits each for matrices I, D, S to keep track of these 9 moves.

Constants

MIN_SCORE

Value to use as a 'negative infinity' score. Should be close to i32::MIN, but avoid underflow when used with reasonable scoring parameters or even adding two negative infinities. Use ~ 0.4 * i32::MIN

Traits

MatchFunc

Trait required to instantiate a Scoring instance