rank_str

Function rank_str 

Source
pub fn rank_str<T: Integer>(
    arr: StringAVT<'_, T>,
) -> Result<IntegerArray<i32>, KernelError>
Expand description

Computes standard SQL ROW_NUMBER() ranking for string data with lexicographic ordering.

Assigns sequential rank values based on lexicographic string comparison, implementing ROW_NUMBER() semantics for textual data. Essential for alphabetical ranking and string-based analytical operations.

§Parameters

  • arr - String array view containing textual values for ranking

§Returns

Returns Result<IntegerArray<i32>, KernelError> containing:

  • Success: Rank values from 1 to n for valid string elements
  • Error: KernelError if capacity validation fails
  • Zero values for null string elements
  • Null mask indicating positions with valid ranks

§String Ranking Semantics

  • Lexicographic order: Uses standard string comparison (dictionary order)
  • Case sensitivity: Comparisons are case-sensitive (“A” < “a”)
  • Unicode support: Proper handling of UTF-8 encoded string data
  • ROW_NUMBER() behaviour: Tied strings receive different ranks by position

§Error Conditions

  • Capacity errors: Returns KernelError if mask capacity validation fails
  • Memory allocation: May fail with insufficient memory for large datasets

§Use Cases

  • Alphabetical ranking: Creating alphabetically ordered rankings
  • Text analysis: Establishing lexicographic ordinality in textual data
  • Database operations: SQL ROW_NUMBER() implementation for string columns
  • Sorting applications: Providing ranking information for string sorting

§Examples

use minarrow::StringArray;
use simd_kernels::kernels::window::rank_str;

let arr = StringArray::<u32>::from_slice(&["zebra", "apple", "banana"]);
let result = rank_str((&arr, 0, arr.len())).unwrap();
// Output: [3, 1, 2] - lexicographic ranking