Expand description
§String Operations Kernels Module - High-Performance String Processing and Text Analysis
String processing kernels for text manipulation, pattern matching, and string analysis operations with UTF-8 awareness and null-safe semantics. Essential infrastructure for text analytics, data cleansing, and string-heavy analytical workloads.
§Core Operations
- String transformations: Case conversion, trimming, padding, and substring operations
- Pattern matching: Regular expression support with compiled pattern caching
- String comparison: Lexicographic ordering with UTF-8 aware collation
- Text analysis: Length calculation, character counting, and encoding detection
- String aggregation: Concatenation with configurable delimiters and null handling
- Search operations: Contains, starts with, ends with predicates with optimised implementations
Functions§
- concat_
dict_ dict - Concatenates corresponding string pairs from two categorical arrays element-wise.
- concat_
dict_ str - Concatenates dictionary values from a categorical array with strings from a string array.
- concat_
str_ dict - Concatenates strings from a string array with dictionary values from a categorical array.
- concat_
str_ str - Concatenates corresponding string pairs from two string arrays element-wise.
- contains_
dict_ dict - Performs string predicate operations between two categorical arrays.
- contains_
dict_ str - Performs string predicate operations between categorical and string arrays.
- contains_
str_ dict - Performs string predicate operations between string and categorical arrays.
- contains_
str_ str - Performs string predicate operations between two string arrays.
- count_
distinct_ string - Counts the number of distinct string values in a string array window.
- ends_
with_ dict_ dict - Performs string predicate operations between two categorical arrays.
- ends_
with_ dict_ str - Performs string predicate operations between categorical and string arrays.
- ends_
with_ str_ dict - Performs string predicate operations between string and categorical arrays.
- ends_
with_ str_ str - Performs string predicate operations between two string arrays.
- len_
dict - Computes the character length of each string in a
CategoricalArray<T>slice, returning anIntegerArray<T>with the same length and null semantics. - len_str
- Computes the character length of each string in a
StringArray<T>slice, returning anIntegerArray<T>with the same length and null semantics. - max_
categorical_ array - Finds the lexicographically maximum dictionary string in a categorical array window.
- max_
string_ array - Finds the lexicographically maximum string in a string array window.
- min_
categorical_ array - Finds the lexicographically minimum dictionary string in a categorical array window.
- min_
string_ array - Finds the lexicographically minimum string in a string array window.
- regex_
dict_ dict - Applies regular expression patterns between two categorical arrays via dictionary lookup.
- regex_
dict_ str - Applies regular expression patterns to categorical array values against string patterns.
- regex_
str_ dict - Applies regular expression patterns from categorical dictionary against string values.
- regex_
str_ str - Applies regular expression pattern matching between two string arrays.
- starts_
with_ dict_ dict - Performs string predicate operations between two categorical arrays.
- starts_
with_ dict_ str - Performs string predicate operations between categorical and string arrays.
- starts_
with_ str_ dict - Performs string predicate operations between string and categorical arrays.
- starts_
with_ str_ str - Performs string predicate operations between two string arrays.
- string_
predicate_ masks - Helper for predicate kernels: produce optional input masks and a fresh output mask