pub fn argsort_radix_u32(data: &[u32], descending: bool) -> Vec<usize>
LSD Radix argsort for u32 - O(n·k) where k=4 (bytes)
Significantly faster than comparison sort for integer data. Uses 8-bit radix (256 buckets) with 4 passes.