Module bwt_sort

Module bwt_sort 

Source
Expand description

This is the entry point for the main bwt_sort algorithm and contains the main sorting algorithm.

The main sorting algorithm is currently based on the standard Rust sort_unstable algorithm. When data is larger than 5k bytes, we use a multi-threaded approach based on Rayon’s par_sort_unstable algorithm.

Since different sorting algorithms are better suited for different kinds of data, this module contains a test to determine whether the data would be better suited to the main algorithm or the fallback algorithm.

NOTE:

  • During the earlier development and testing phases, I ported Julian Seward’s sort algorithm into Rust. However the built-in sort_unstable algorithm performed equally well as my port. Using the built-in algorithm avoids a lot of complexity. (I do welcome suggestions for improved sorting algorithms.)
  • I have tried multiple different algorithms to test data entropy in order to choose when to select the fallback algorithm. The final method I finally implemented is based on the “left most smaller” (LMS) concept in suffix array algorithms.

Functions§

bwt_decode
Decode a Burrows-Wheeler-Transform. Requires a key, a u8 slice containing the BWT data, and an array of the u8 frequencies found in the data. Returns the decoded data as a u8 vec.
bwt_encode
Encode data using the Burrows-Wheeler-Transform. Requires a u8 slice of data to be sorted. This returns a u32 key and a u8 vec of the BWT data.