Suffix array
Suffix array construction and searching algorithms for in-memory binary data.
For text indexing, burntsushi's suffix featuring utf-8 support is preferred.
Suffix array construction algorithm (SACA) in this crate uses the C bindings by Amos Wenger to Yuta Mori's dissufsort, which is the fastest known SACA running in single thread with only O(1) additional workspace.
In addition, I have implemented a space-efficient parallel SACA named pSACAK in a separate crate. For now, it has not been thoroughly benchmarked as well as optimized. If it is proved to be valuable, I would like to make it an optional SACA for this crate in future.
TODO
- Benchmark using Pizza&Chili Corpus.
- Compare LMS substrings in parallel.
- Speed up searching by LCP array (enhanced suffix array).
- Add compressed suffix array support.
- Serialization/deserialization.
- Rewrite suffix array construction algorithm (try to parallelize SACA-K according to the recent parallelization efforts on SAIS and SACK-K ([1], [2]) by Bin Lao. Or simply switch the suffix array construction slgorithm to Amos Wenger's hand-ported divsufsort or the C binding.