Expand description
String Algorithms Module for NumRS2
This module provides efficient implementations of fundamental string algorithms suitable for bioinformatics, text mining, information retrieval, and data compression.
§Overview
The module is organized into five main areas:
- Pattern Matching: KMP, Rabin-Karp, and Boyer-Moore-Horspool search
- String Distances: Levenshtein edit distance, longest common subsequence
- Suffix Structures: Suffix array with LCP array and binary search
- Text Transforms: Burrows-Wheeler Transform (encode/decode)
- Hashing: FNV-1a and djb2 non-cryptographic hash functions
§Design Philosophy
All algorithms are implemented natively in pure Rust following the approach of
scirs2-core 0.3.0’s string algorithms infrastructure. No unwrap() calls
appear in production paths. All functions operate on byte slices (&[u8]) for
maximum flexibility, with &str convenience wrappers where appropriate.
§Algorithmic References
- Knuth, D. E., Morris, J. H., Pratt, V. R. (1977). Fast pattern matching in strings.
- Karp, R. M., Rabin, M. O. (1987). Efficient randomized pattern-matching algorithms.
- Boyer, R. S., Moore, J. S. (1977). A fast string searching algorithm.
- Horspool, R. N. (1980). Practical fast searching in strings.
- Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals.
- Burrows, M., Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm.
- Kasai, T. et al. (2001). Linear-time longest-common-prefix computation in suffix arrays.
Structs§
- Suffix
Array - Suffix array with LCP (Longest Common Prefix) array support.
Functions§
- boyer_
moore_ search - Boyer-Moore-Horspool single-pattern search.
- bwt_
decode - Decode a Burrows-Wheeler Transform.
- bwt_
encode - Encode
textwith the Burrows-Wheeler Transform. - djb2_
hash - djb2 64-bit hash of a byte slice.
- fnv1a_
hash - FNV-1a 64-bit hash of a byte slice.
- kmp_
search - Knuth-Morris-Pratt pattern search.
- lcs_
length - Compute the length of the longest common subsequence (LCS) of
aandb. - lcs_
string - Reconstruct one actual longest common subsequence from
aandb. - levenshtein_
distance - Compute the Levenshtein edit distance between strings
aandb. - rabin_
karp_ search - Rabin-Karp single-pattern search using a polynomial rolling hash.