Expand description
Construct the Burrows-Wheeler-Transform (BWT) for a text using BwtConstruction.
To construct the BWT, a temporary (suffix) array is constructed by libsais. However,
this temporary array does not contain the suffix array after the procedure.
The entry point to the API is the BwtConstruction builder-like struct. It is always required to
pass the input text, register the output element of the temporary array and make a decision about
parallelisation. Further configuration options include supplying an output and/or temporary array buffer,
context, metadata about the input text, usage of auxiliary indices, usage of additional memory
in the algorithm and instructing the library to replace the text by the BWT.
The following is an example of the BWT construction that contains some of its unique configuration
options. See suffix_array for an example of most of the other options.
use libsais::{BwtConstruction, bwt, suffix_array::ExtraSpace};
let mut text = vec![b'a'; 10];
let res = BwtConstruction::replace_text(&mut text)
.with_owned_temporary_array_buffer_and_extra_space32(ExtraSpace::Fixed { value: 10 })
.single_threaded()
.with_aux_indices(bwt::AuxIndicesSamplingRate::from(4))
.run()
.unwrap();
println!("{:?}", res.bwt());
println!("{:?}", res.aux_indices());§Output Conventions
In the literature, the text input for the BWT construction is typically assumed to be terminated by a unique, lexicographically smallest character. This character is called sentinel and denoted by $.
libsais does not require the text to be terminated by a sentinel, but it behaves as if a sentinel
were present. In the output BWT, the sentinel is not included. Therefore, the BWT has the same length
as the input text.
§Primary Index and Auxiliary Indices
To recover the original text from a BWT, the primary index i0 of the BWT is needed. It is defined as the
index for which SUF[i0 - 1] = 0, where SUF is the suffix array in libsais convention. The -1
is applied, because the suffix array also does not contain an entry for the sentinel. Such an entry
would always be at the first position.
The primary index is part of the return type of BwtConstruction::run. It is 0 for the empty input text.
libsais allows creating a subsampled array of auxiliary indices. These indices have the same role as the
primary index and are defined as follows: AUX[i] == k => SUF[k - 1] = i * r, where r is the sampling rate.
In particular, AUX[0] is th primary index. The auxiliary indices can be used in different ways.
For example, they allow significantly faster BWT reversal, both single and multi threaded. Benchmark
§Return Type and Reversal
The read-only return type of BwtConstruction::run bundles the BWT with either the primary index or the
auxiliary indices and their sampling rate. It is generic over whether an owned or borrowed output buffer is used.
The object can be destructured into parts or used to safely reverse the BWT and obtain the text again.
An example of constructing and reversing the BWT can be found here.
Structs§
- AuxIndices
Sampling Rate - The sampling rate for auxiliary indices of the BWT construction
- Bwt
- The read-only return type of a BWT construction without auxiliary indices.
- BwtConstruction
- One of the two main entry points of this library, for constructing BWTs.
- BwtWith
AuxIndices - The read-only return type of a BWT construction with auxiliary indices.