suffix_array 0.4.1

Suffix array construction and searching algorithms for in-memory binary data.
Documentation

Suffix array

Suffix array construction and searching algorithms for in-memory binary data.

For text indexing, burntsushi's suffix featuring utf-8 support is preferred.

Suffix array construction algorithm (SACA) in this crate uses the C bindings by Amos Wenger to Yuta Mori's dissufsort, which is the fastest known SACA running in single thread with only O(1) additional workspace.

In addition, I have implemented a space-efficient parallel SACA named pSACAK in a separate crate. For now, it has not been thoroughly benchmarked as well as optimized. If it is proved to be valuable, I would like to make it an optional SACA for this crate in future.

TODO

  • Benchmark using Pizza&Chili Corpus.
  • Compare LMS substrings in parallel.
  • Speed up searching by LCP array (enhanced suffix array).
  • Add compressed suffix array support.
  • Serialization/deserialization.
  • Rewrite suffix array construction algorithm (try to parallelize SACA-K according to the recent parallelization efforts on SAIS and SACK-K ([1], [2]) by Bin Lao. Or simply switch the suffix array construction slgorithm to Amos Wenger's hand-ported divsufsort or the C binding.