Skip to main content

Module string_ops

Module string_ops 

Source
Expand description

String processing utilities for scientific computing

This module provides common string distance metrics, similarity measures, tokenization, n-gram generation, and case conversion utilities used across the SciRS2 ecosystem.

§Distance Metrics

§Subsequences

§Tokenization & N-grams

§Case Conversions

Functions§

center
Center a string within a given width.
char_ngrams
Generate character-level n-grams from a string.
count_occurrences
Count occurrences of a substring.
dice_coefficient
Compute the Dice coefficient (bigram overlap) between two strings.
hamming_distance
Compute the Hamming distance between two strings of equal length.
is_palindrome
Check if a string is a palindrome (case-insensitive, alphanumeric only).
jaro_similarity
Compute the Jaro similarity between two strings.
jaro_winkler_similarity
Compute the Jaro-Winkler similarity between two strings.
lcs_similarity
Compute the LCS similarity (0.0 = no common subsequence, 1.0 = identical).
lcs_string
Return the actual longest common subsequence string.
levenshtein_distance
Compute the Levenshtein edit distance between two strings.
levenshtein_similarity
Compute the Levenshtein similarity (1.0 = identical, 0.0 = completely different).
longest_common_subsequence
Compute the length of the longest common subsequence (LCS) of two strings.
ngrams
Generate word-level n-grams from a list of tokens.
normalized_levenshtein
Compute the normalized Levenshtein distance (0.0 = identical, 1.0 = completely different).
pad_left
Pad a string on the left to a given width.
pad_right
Pad a string on the right to a given width.
reverse
Reverse a string (Unicode-aware).
skip_bigrams
Generate skip-grams (n-grams with gaps).
to_camel_case
Convert a string to camelCase.
to_kebab_case
Convert a string to kebab-case.
to_pascal_case
Convert a string to PascalCase.
to_screaming_snake_case
Convert a string to SCREAMING_SNAKE_CASE.
to_snake_case
Convert a string to snake_case.
to_title_case
Convert a string to Title Case.
tokenize_char
Tokenize a string by splitting on a delimiter character.
tokenize_pattern
Tokenize by splitting on a string pattern.
tokenize_predicate
Tokenize by splitting on any character matching a predicate.
tokenize_sentences
Tokenize into sentences (split on ‘.’, ‘!’, ‘?’).
tokenize_whitespace
Tokenize a string by splitting on whitespace.
tokenize_words
Simple word tokenizer that splits on non-alphanumeric characters.