[−][src]Module bio::pattern_matching
This module contains various useful pattern matching algorithms. The implementations are based on the lecture notes "Algorithmen auf Sequenzen", Kopczynski, Marschall, Martin and Rahmann, 2008 - 2015.
- Algorithm of Horspool: fastest for a sufficiently large alphabet
- Shift And algorithm: fast for patterns with less than 64 symbols and very small alphabets.
- BNDM algorithm: fast for patterns with less than 64 symbols.
- BOM algorithm: fast for long patterns and small alphabet.
- KMP algorithm: the classical ancestor.
- Ukkonens algorithm: approximate pattern matching with dynamic programming.
- Myers algorithm: linear-time approximate pattern matching with edit distance for small patterns
Another fast pattern matching algorithm is available in the twoway crate: https://crates.io/crates/twoway
Modules
bndm | Backward nondeterministic DAWG matching (BNDM). Best-case complexity: O(n / m) with pattern of length m <= 64 and text of length n. Worst case complexity: O(n * m). |
bom | Backward oracle matching algorithm. Best-case complexity: O(n / m) with pattern of length m and text of length n. Worst case complexity: O(n * m). |
horspool | Algorithm of Horspool. Window-based, similar to but faster than Boyer-Moore. |
kmp | Algorithm of Knuth Morris and Pratt. Constructs an automaton recognizing the pattern, and scans linearly over a text of length n. Complexity: O(n). The transition function delta is simulated via the lps-function, that assigns to each position q in the pattern the longest prefix of the pattern that is suffix of pattern[..q+1]. Then, in the NFA for the pattern, active states after reading position q are {q, lps(q), lps(lps(q)), ... 0}. |
myers | Myers bit-parallel approximate pattern matching algorithm. Finds all matches up to a given edit distance. The pattern has to fit into a bitvector, and is thus limited to 64 or (since stable Rust version 1.26) to 128 symbols. Complexity: O(n) |
pssm | Create a weight matrix representing a set of aligned reference sequences that constitute a motif, and use this matrix to scan query sequences for occurrences of this motif. Complexity: O(n*m) for motif length n and query length m |
shift_and |
|
ukkonen | Bounded version of Ukkonens DP algorithm for approximate pattern matching. Complexity: O(n * k) on random texts. |