Crate rdxsort

Source
Expand description

§RdxSort

This crate implements Radix Sort for slices of different data types, either directly or by exploiting the implementation of other data types.

The main reason for implementing yet another sorting algorithm is that most sorting algorithms are comparative methods. To sort data, they rely on a function that compares data elements. It can be proven that this leads to a runtime complexity of O(n*log(n)) in average and O(n^2) in the worst case. In contrast to that, Radix Sort exploits that fact that many data types have a limited range of possible values and within that range a limited resolution (that also holds for floating point numbers). For a detailed explanation see the Wikipedia article. The result of this special treatment is a lowered and constant complexity of O(n*k) where k is the number of fixed rounds required to sort a specific data type.

§Supported Data Types

Currently, the following data types are supported:

  • bool: simple split into 2 junks
  • char: behaves like u32
  • unsigned integers: native implementation, depending on the width
  • signed integers: splitting into positive and negative parts and using the unsigned implementation
  • floats: splits data into -∞, (-∞,-0), -0, +0, (+0,+∞), +∞ and treating the two ranges as unsigned integer values. Subnormals and NaNs are not supported!
  • arrays, tuples: use the implementation of the inner data types
  • custom data types…: fill in the provided template trait

§Example

use rdxsort::*;

fn main() {
    let mut data = vec![2, 10, 0, 1];
    data.rdxsort();
    assert!(data == vec![0, 1, 2, 10]);
}

§Performance

Of course the lower runtime complexity of Radix Sort shows its power when sorting certain data types. The advantage depends on the size and complexity of the type. While short unsigned integers benefit the most, long types do not show that huge improvements. The following listing shows the runtime in ns required for sorting data sets of different sizes. The data sets are sampled using an uniform distribution. The best algorithm out of the following is marked:

Keep in mind that the results may vary depending on the hardware, compiler version, operating system version and configuration and the weather.

§Small (1’000 elements)

For small data sets Radix Sort can be an advantage for data types with up to 32 bits size. For 64 bits, standard library sorting should be preferred.

typequicksortrdxsortstd
bool4,0702,24626,068
char31,12120,20434,051
f3279,71425,82577,774
f6480,95452,26279,431
i1632,89612,49631,167
i3232,85422,00930,713
i6433,06453,36631,669
i824,8198,19046,281
u1635,2529,93733,946
u3233,00219,20233,627
u6432,98647,73933,204
u825,4255,17047,369

§Medium (10’000 elements)

For medium data sets Radix Sort could be blindly used for all data types since the disadvantage for types with 64 bits is quite small.

typequicksortrdxsortstd
bool52,21122,083400,111
char655,553192,328508,557
f321,086,882230,6701,117,565
f641,095,529417,1041,152,340
i16665,131108,128455,047
i32650,501202,533460,097
i64669,643378,480470,572
i8383,54565,521617,405
u16675,06078,424508,264
u32670,348177,068511,134
u64664,549342,935517,176
u8386,57237,012655,377

§Large (100’000 elements)

For large data sets, Radix Sort is great for all data types.

typequicksortrdxsortstd
bool815,653216,8094,900,426
char8,131,4022,538,0646,333,865
f3213,554,2913,264,65714,340,082
f6413,746,1907,122,33415,563,206
i168,235,6421,248,2895,575,196
i328,184,9022,494,8825,852,913
i648,222,4825,446,5076,561,292
i83,664,532703,2887,118,816
u168,272,903866,8336,291,101
u328,193,4082,051,4136,395,163
u648,179,0784,393,5797,216,868
u83,681,361367,2407,816,775

§Implementing New Types

This crate enables you to add support for new types by implementing RdxSortTemplate. It describes how data is sorted into buckets and how many rounds of sorting are scheduled.

use rdxsort::*;

// `Clone` is required for `RdxSort`
// `PartialEq` is only required for the equality assert, not for the actual sorting
#[derive(Clone, PartialEq)]
struct Foo {
    a: u8,
    b: u8,
}

impl RdxSortTemplate for Foo {
    // using `#[inline]` is generally recommended since it helps
    // the compiler to optimize the sorting algorithm
    #[inline]
    fn cfg_nbuckets() -> usize {
        // usually too high, but works as a simple demonstration
        // `256 = 2^8`
        256
    }

    #[inline]
    fn cfg_nrounds() -> usize {
        // one per sub-type
        2
    }

    #[inline]
    fn get_bucket(&self, round: usize) -> usize {
        // return the least significant digit first
        if round == 0 {
            self.b as usize
        } else {
            self.a as usize
        }
    }

    #[inline]
    fn reverse(_round: usize, _bucket: usize) -> bool {
        // not required in our case
        false
    }
}

fn main() {
    let mut data = vec![
        Foo{a: 5, b: 2},
        Foo{a: 0, b: 1},
        Foo{a: 5, b: 1},
        Foo{a: 0, b: 2},
    ];
    data.rdxsort();

    let reference = vec![
        Foo{a: 0, b: 1},
        Foo{a: 0, b: 2},
        Foo{a: 5, b: 1},
        Foo{a: 5, b: 2},
    ];
    assert!(data == reference);
}

Macros§

rdxsort_template_alias
Implements t1 as alias of t2, e.g. usize = u64 on platforms that have 64 bit pointers.

Traits§

RdxSort
Radix Sort implementation for some type
RdxSortTemplate
Generic Radix Sort implementation