Expand description
This is the entry point for the main bwt_sort algorithm and contains the main sorting algorithm.
The main sorting algorithm is currently based on the standard Rust sort_unstable algorithm. When data is larger than 5k bytes, we use a multi-threaded approach based on Rayon’s par_sort_unstable algorithm.
Since different sorting algorithms are better suited for different kinds of data, this module contains a test to determine whether the data would be better suited to the main algorithm or the fallback algorithm.
NOTE:
- During the earlier development and testing phases, I ported Julian Seward’s sort algorithm into Rust. However the built-in sort_unstable algorithm performed equally well as my port. Using the built-in algorithm avoids a lot of complexity. (I do welcome suggestions for improved sorting algorithms.)
- I have tried multiple different algorithms to test data entropy in order to choose when to select the fallback algorithm. The final method I finally implemented is based on the “left most smaller” (LMS) concept in suffix array algorithms.
Functions§
- bwt_
decode - Decode a Burrows-Wheeler-Transform. Requires a key, a u8 slice containing the BWT data, and an array of the u8 frequencies found in the data. Returns the decoded data as a u8 vec.
- bwt_
encode - Encode data using the Burrows-Wheeler-Transform. Requires a u8 slice of data to be sorted. This returns a u32 key and a u8 vec of the BWT data.