Module bio::alignment [] [src]

Various alignment and distance computing algorithms.

Modules

distance

Various subroutines for computing a distance between sequences.

pairwise

Calculate alignments with a generalized variant of the Smith Waterman algorithm. Complexity: O(n * m) for strings of length m and n.

sparse

Calculate 'sparse' alignments from kmer matches. Can be much faster than Smith-Waterman for long string, when a large enough k is used. Complexity: O(n * log(n)) for a pair of strings with n k-kmer matches. This approach is useful for generating an approximate 'backbone' alignments between two long sequences, for example in long-read alignment or genome-genome alignment. The backbone alignment can be used as-is, or can serve as a guide for a banded alignment. By tuning k so that len(query) + len(reference) < 4k, the number of false positive kmer matches is kept small, resulting in very fast run times for long strings.

Structs

Alignment

We consider alignment between two sequences x and y. x is the query or read sequence and y is the reference or template sequence. An alignment, consisting of a score, the start and end position of the alignment on sequence x and sequence y, the lengths of sequences x and y, and the alignment edit operations. The start position and end position of the alignment does not include the clipped regions. The length of clipped regions are already encapsulated in the Alignment Operation.

Enums

AlignmentMode

The modes of alignment supported by the aligner include standard modes such as Global, Semi-Global and Local alignment. In addition to this, user can also invoke the custom mode. In the custom mode, users can explicitly specify the clipping penalties for prefix and suffix of strings 'x' and 'y' independently. Under the hood the standard modes are implemented as special cases of the custom mode with the clipping penalties appropriately set

AlignmentOperation

Alignment operations supported are match, substitution, insertion, deletion and clipping. Clipping is a special boundary condition where you are allowed to clip off the beginning/end of the sequence for a fixed clip penalty. The clip penalty could be different for the two sequences x and y, and the clipping operations on both are distinguishable (Xclip and Yclip). The usize value associated with the clipping operations are the lengths clipped. In case of standard modes like Global, Semi-Global and Local alignment, the clip operations are filtered out