Skip to main content

Module string_algorithms

Module string_algorithms 

Source
Expand description

String Algorithms Module for NumRS2

This module provides efficient implementations of fundamental string algorithms suitable for bioinformatics, text mining, information retrieval, and data compression.

§Overview

The module is organized into five main areas:

  • Pattern Matching: KMP, Rabin-Karp, and Boyer-Moore-Horspool search
  • String Distances: Levenshtein edit distance, longest common subsequence
  • Suffix Structures: Suffix array with LCP array and binary search
  • Text Transforms: Burrows-Wheeler Transform (encode/decode)
  • Hashing: FNV-1a and djb2 non-cryptographic hash functions

§Design Philosophy

All algorithms are implemented natively in pure Rust following the approach of scirs2-core 0.3.0’s string algorithms infrastructure. No unwrap() calls appear in production paths. All functions operate on byte slices (&[u8]) for maximum flexibility, with &str convenience wrappers where appropriate.

§Algorithmic References

  • Knuth, D. E., Morris, J. H., Pratt, V. R. (1977). Fast pattern matching in strings.
  • Karp, R. M., Rabin, M. O. (1987). Efficient randomized pattern-matching algorithms.
  • Boyer, R. S., Moore, J. S. (1977). A fast string searching algorithm.
  • Horspool, R. N. (1980). Practical fast searching in strings.
  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals.
  • Burrows, M., Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm.
  • Kasai, T. et al. (2001). Linear-time longest-common-prefix computation in suffix arrays.

Structs§

SuffixArray
Suffix array with LCP (Longest Common Prefix) array support.

Functions§

boyer_moore_search
Boyer-Moore-Horspool single-pattern search.
bwt_decode
Decode a Burrows-Wheeler Transform.
bwt_encode
Encode text with the Burrows-Wheeler Transform.
djb2_hash
djb2 64-bit hash of a byte slice.
fnv1a_hash
FNV-1a 64-bit hash of a byte slice.
kmp_search
Knuth-Morris-Pratt pattern search.
lcs_length
Compute the length of the longest common subsequence (LCS) of a and b.
lcs_string
Reconstruct one actual longest common subsequence from a and b.
levenshtein_distance
Compute the Levenshtein edit distance between strings a and b.
rabin_karp_search
Rabin-Karp single-pattern search using a polynomial rolling hash.