Module string

Module string 

Source
Expand description

§String Operations Kernels Module - High-Performance String Processing and Text Analysis

String processing kernels for text manipulation, pattern matching, and string analysis operations with UTF-8 awareness and null-safe semantics. Essential infrastructure for text analytics, data cleansing, and string-heavy analytical workloads.

§Core Operations

  • String transformations: Case conversion, trimming, padding, and substring operations
  • Pattern matching: Regular expression support with compiled pattern caching
  • String comparison: Lexicographic ordering with UTF-8 aware collation
  • Text analysis: Length calculation, character counting, and encoding detection
  • String aggregation: Concatenation with configurable delimiters and null handling
  • Search operations: Contains, starts with, ends with predicates with optimised implementations

Functions§

concat_dict_dict
Concatenates corresponding string pairs from two categorical arrays element-wise.
concat_dict_str
Concatenates dictionary values from a categorical array with strings from a string array.
concat_str_dict
Concatenates strings from a string array with dictionary values from a categorical array.
concat_str_str
Concatenates corresponding string pairs from two string arrays element-wise.
contains_dict_dict
Performs string predicate operations between two categorical arrays.
contains_dict_str
Performs string predicate operations between categorical and string arrays.
contains_str_dict
Performs string predicate operations between string and categorical arrays.
contains_str_str
Performs string predicate operations between two string arrays.
count_distinct_string
Counts the number of distinct string values in a string array window.
ends_with_dict_dict
Performs string predicate operations between two categorical arrays.
ends_with_dict_str
Performs string predicate operations between categorical and string arrays.
ends_with_str_dict
Performs string predicate operations between string and categorical arrays.
ends_with_str_str
Performs string predicate operations between two string arrays.
len_dict
Computes the character length of each string in a CategoricalArray<T> slice, returning an IntegerArray<T> with the same length and null semantics.
len_str
Computes the character length of each string in a StringArray<T> slice, returning an IntegerArray<T> with the same length and null semantics.
max_categorical_array
Finds the lexicographically maximum dictionary string in a categorical array window.
max_string_array
Finds the lexicographically maximum string in a string array window.
min_categorical_array
Finds the lexicographically minimum dictionary string in a categorical array window.
min_string_array
Finds the lexicographically minimum string in a string array window.
regex_dict_dict
Applies regular expression patterns between two categorical arrays via dictionary lookup.
regex_dict_str
Applies regular expression patterns to categorical array values against string patterns.
regex_str_dict
Applies regular expression patterns from categorical dictionary against string values.
regex_str_str
Applies regular expression pattern matching between two string arrays.
starts_with_dict_dict
Performs string predicate operations between two categorical arrays.
starts_with_dict_str
Performs string predicate operations between categorical and string arrays.
starts_with_str_dict
Performs string predicate operations between string and categorical arrays.
starts_with_str_str
Performs string predicate operations between two string arrays.
string_predicate_masks
Helper for predicate kernels: produce optional input masks and a fresh output mask