pub fn rank_str<T: Integer>(
arr: StringAVT<'_, T>,
) -> Result<IntegerArray<i32>, KernelError>Expand description
Computes standard SQL ROW_NUMBER() ranking for string data with lexicographic ordering.
Assigns sequential rank values based on lexicographic string comparison, implementing ROW_NUMBER() semantics for textual data. Essential for alphabetical ranking and string-based analytical operations.
§Parameters
arr- String array view containing textual values for ranking
§Returns
Returns Result<IntegerArray<i32>, KernelError> containing:
- Success: Rank values from 1 to n for valid string elements
- Error: KernelError if capacity validation fails
- Zero values for null string elements
- Null mask indicating positions with valid ranks
§String Ranking Semantics
- Lexicographic order: Uses standard string comparison (dictionary order)
- Case sensitivity: Comparisons are case-sensitive (“A” < “a”)
- Unicode support: Proper handling of UTF-8 encoded string data
- ROW_NUMBER() behaviour: Tied strings receive different ranks by position
§Error Conditions
- Capacity errors: Returns KernelError if mask capacity validation fails
- Memory allocation: May fail with insufficient memory for large datasets
§Use Cases
- Alphabetical ranking: Creating alphabetically ordered rankings
- Text analysis: Establishing lexicographic ordinality in textual data
- Database operations: SQL ROW_NUMBER() implementation for string columns
- Sorting applications: Providing ranking information for string sorting
§Examples
ⓘ
use minarrow::StringArray;
use simd_kernels::kernels::window::rank_str;
let arr = StringArray::<u32>::from_slice(&["zebra", "apple", "banana"]);
let result = rank_str((&arr, 0, arr.len())).unwrap();
// Output: [3, 1, 2] - lexicographic ranking