[−][src]Module bio::alignment::pairwise
Calculate alignments with a generalized variant of the Smith Waterman algorithm. Complexity: O(n * m) for strings of length m and n.
For quick computation of alignments and alignment scores there are 6 simple functions.
Example
use bio::alignment::pairwise::*; use bio::alignment::AlignmentOperation::*; let x = b"ACCGTGGAT"; let y = b"AAAAACCGTTGAT"; let score = |a: u8, b: u8| if a == b {1i32} else {-1i32}; let mut aligner = Aligner::with_capacity(x.len(), y.len(), -5, -1, &score); let alignment = aligner.semiglobal(x, y); // x is global (target sequence) and y is local (reference sequence) assert_eq!(alignment.ystart, 4); assert_eq!(alignment.xstart, 0); assert_eq!(alignment.operations, [Match, Match, Match, Match, Match, Subst, Match, Match, Match]); // If you don't known sizes of future sequences, you could // use Aligner::new(). // Global alignment: let mut aligner = Aligner::new(-5, -1, &score); let x = b"ACCGTGGAT"; let y = b"AAAAACCGTTGAT"; let alignment = aligner.global(x, y); assert_eq!(alignment.ystart, 0); assert_eq!(alignment.xstart, 0); assert_eq!(aligner.local(x, y).score, 7); // In addition to the standard modes (Global, Semiglobal and Local), a custom alignment // mode is supported which supports a user-specified clipping penalty. Clipping is a // special boundary condition where you are allowed to clip off the beginning/end of // the sequence for a fixed penalty. As a starting example, we can use the custom mode // for achieving the three standard modes as follows. // scoring for semiglobal mode let scoring = Scoring::new( -5, -1, &score) // Gap open, gap extend and match score function .xclip(MIN_SCORE) // Clipping penalty for x set to 'negative infinity', hence global in x .yclip(0); // Clipping penalty for y set to 0, hence local in y let mut aligner = Aligner::with_scoring(scoring); let alignment = aligner.custom(x,y); // The custom aligner invocation assert_eq!(alignment.ystart, 4); assert_eq!(alignment.xstart, 0); // Note that in the custom mode, the clips are explicitly mentioned in the operations assert_eq!(alignment.operations, [Yclip(4), Match, Match, Match, Match, Match, Subst, Match, Match, Match]); // scoring for global mode // scoring can also be created usinf from_scores if the match and mismatch scores are constants let scoring = Scoring::from_scores( -5, -1, 1, -1) // Gap open, extend, match, mismatch score .xclip(MIN_SCORE) // Clipping penalty for x set to 'negative infinity', hence global in x .yclip(MIN_SCORE); // Clipping penalty for y set to 'negative infinity', hence global in y let mut aligner = Aligner::with_scoring(scoring); let alignment = aligner.custom(x,y); // The custom aligner invocation assert_eq!(alignment.ystart, 0); assert_eq!(alignment.xstart, 0); // Note that in the custom mode, the clips are explicitly mentioned in the operations assert_eq!(alignment.operations, [Del, Del, Del, Del, Match, Match, Match, Match, Match, Subst, Match, Match, Match]); // Similarly if the clip penalties are both set to 0, we have local alignment mode. The scoring // struct also lets users set different penalties for prefix/suffix clipping, thereby letting // users have the flexibility to create a wide variety of boundary conditions. The xclip() and // yclip() methods sets the prefix and suffix penalties to be equal. The scoring struct can be // explicitly constructed for full flexibility. // The following example considers a modification of the semiglobal mode where you are allowed // to skip a prefix of the target sequence x, for a penalty of -10, but you have to consume // the rest of the string in the alignment let scoring = Scoring { gap_open: -5, gap_extend: -1, match_fn: |a: u8, b: u8| if a == b {1i32} else {-3i32}, match_scores: Some((1, -3)), xclip_prefix: -10, xclip_suffix: MIN_SCORE, yclip_prefix: 0, yclip_suffix: 0 }; let x = b"GGGGGGACGTACGTACGT"; let y = b"AAAAACGTACGTACGTAAAA"; let mut aligner = Aligner::with_capacity_and_scoring(x.len(), y.len(), scoring); let alignment = aligner.custom(x, y); println!("{}", alignment.pretty(x,y)); assert_eq!(alignment.score, 2); assert_eq!(alignment.operations, [Yclip(4), Xclip(6), Match, Match, Match, Match, Match, Match, Match, Match, Match, Match, Match, Match, Yclip(4)]);
Modules
banded | Banded Smith-Waterman alignment for fast comparison of long strings. Use sparse dynamic programming to find a 'backbone' alignment from exact k-mer matches, then compute the SW alignment in a 'band' surrounding the backbone, with a configurable width w. This method is not guaranteed to recover the Smith-Waterman alignment, but will usually find the same alignment if a) there is a reasonable density of exact k-mer matches between the sequences, and b) the width parameter w is larger than the excursion of the alignment path from diagonal between successive kmer matches. This technique is employed in long-read aligners (e.g. BLASR and BWA) to drastically reduce runtime compared to Smith Waterman. Complexity roughly O(min(m,n) * w) |
Structs
Aligner | A generalized Smith-Waterman aligner. |
MatchParams | A concrete data structure which implements trait MatchFunc with constant match and mismatch scores |
Scoring | Details of scoring are encapsulated in this structure. An affine gap score model is used so that the gap score for a length 'k' is: GapScore(k) = gap_open + gap_extend * k |
TracebackCell | Packed representation of one cell of a Smith-Waterman traceback matrix. Stores the I, D and S traceback matrix values in two bytes. Possible traceback moves include : start, insert, delete, match, substitute, prefix clip and suffix clip for x & y. So we need 4 bits each for matrices I, D, S to keep track of these 9 moves. |
Constants
MIN_SCORE | Value to use as a 'negative infinity' score. Should be close to i32::MIN, but avoid underflow when used with reasonable scoring parameters or even adding two negative infinities. Use ~ 0.4 * i32::MIN |
Traits
MatchFunc | Trait required to instantiate a Scoring instance |