Module bwt

Module bwt 

Source
Expand description

Construct the Burrows-Wheeler-Transform (BWT) for a text using BwtConstruction.

To construct the BWT, a temporary (suffix) array is constructed by libsais. However, this temporary array does not contain the suffix array after the procedure.

The entry point to the API is the BwtConstruction builder-like struct. It is always required to pass the input text, register the output element of the temporary array and make a decision about parallelisation. Further configuration options include supplying an output and/or temporary array buffer, context, metadata about the input text, usage of auxiliary indices, usage of additional memory in the algorithm and instructing the library to replace the text by the BWT.

The following is an example of the BWT construction that contains some of its unique configuration options. See suffix_array for an example of most of the other options.

use libsais::{BwtConstruction, bwt, suffix_array::ExtraSpace};

let mut text = vec![b'a'; 10];

let res = BwtConstruction::replace_text(&mut text)
    .with_owned_temporary_array_buffer_and_extra_space32(ExtraSpace::Fixed { value: 10 })
    .single_threaded()
    .with_aux_indices(bwt::AuxIndicesSamplingRate::from(4))
    .run()
    .unwrap();

println!("{:?}", res.bwt());
println!("{:?}", res.aux_indices());

§Output Conventions

In the literature, the text input for the BWT construction is typically assumed to be terminated by a unique, lexicographically smallest character. This character is called sentinel and denoted by $.

libsais does not require the text to be terminated by a sentinel, but it behaves as if a sentinel were present. In the output BWT, the sentinel is not included. Therefore, the BWT has the same length as the input text.

§Primary Index and Auxiliary Indices

To recover the original text from a BWT, the primary index i0 of the BWT is needed. It is defined as the index for which SUF[i0 - 1] = 0, where SUF is the suffix array in libsais convention. The -1 is applied, because the suffix array also does not contain an entry for the sentinel. Such an entry would always be at the first position.

The primary index is part of the return type of BwtConstruction::run. It is 0 for the empty input text.

libsais allows creating a subsampled array of auxiliary indices. These indices have the same role as the primary index and are defined as follows: AUX[i] == k => SUF[k - 1] = i * r, where r is the sampling rate. In particular, AUX[0] is th primary index. The auxiliary indices can be used in different ways. For example, they allow significantly faster BWT reversal, both single and multi threaded. Benchmark

§Return Type and Reversal

The read-only return type of BwtConstruction::run bundles the BWT with either the primary index or the auxiliary indices and their sampling rate. It is generic over whether an owned or borrowed output buffer is used. The object can be destructured into parts or used to safely reverse the BWT and obtain the text again.

An example of constructing and reversing the BWT can be found here.

Structs§

AuxIndicesSamplingRate
The sampling rate for auxiliary indices of the BWT construction
Bwt
The read-only return type of a BWT construction without auxiliary indices.
BwtConstruction
One of the two main entry points of this library, for constructing BWTs.
BwtWithAuxIndices
The read-only return type of a BWT construction with auxiliary indices.