Expand description
Various alignment and distance computing algorithms.
Modules
Various subroutines for computing a distance between sequences.
Calculate alignments with a generalized variant of the Smith Waterman algorithm.
Complexity: O(n * m) for strings of length m and n.
Calculate ‘sparse’ alignments from kmer matches. Can be much faster than
Smith-Waterman for long string, when a large enough k is used.
Complexity: O(n * log(n)) for a pair of strings with n k-kmer matches. This
approach is useful for generating an approximate ‘backbone’ alignments
between two long sequences, for example in long-read alignment or
genome-genome alignment. The backbone alignment can be used as-is, or can serve
as a guide for a banded alignment. By tuning k so that len(query) + len(reference) < 4^k,
the number of false positive kmer matches is kept small, resulting in very
fast run times for long strings.
Structs
We consider alignment between two sequences x and y. x is the query or read sequence
and y is the reference or template sequence. An alignment, consisting of a score,
the start and end position of the alignment on sequence x and sequence y, the
lengths of sequences x and y, and the alignment edit operations. The start position
and end position of the alignment does not include the clipped regions. The length
of clipped regions are already encapsulated in the Alignment Operation.
Enums
The modes of alignment supported by the aligner include standard modes such as
Global, Semi-Global and Local alignment. In addition to this, user can also invoke
the custom mode. In the custom mode, users can explicitly specify the clipping penalties
for prefix and suffix of strings ‘x’ and ‘y’ independently. Under the hood the standard
modes are implemented as special cases of the custom mode with the clipping penalties
appropriately set.
Alignment operations supported are match, substitution, insertion, deletion
and clipping. Clipping is a special boundary condition where you are allowed
to clip off the beginning/end of the sequence for a fixed clip penalty. The
clip penalty could be different for the two sequences x and y, and the
clipping operations on both are distinguishable (Xclip and Yclip). The usize
value associated with the clipping operations are the lengths clipped. In case
of standard modes like Global, Semi-Global and Local alignment, the clip operations
are filtered out