Struct rscompress_transformation::BurrowWheeler[][src]

pub struct BurrowWheeler { /* fields omitted */ }
Expand description

Burrow-Wheeler transformation

Implementation of the Burrow-Wheeler Transformation as described here.

Algorithm

In the following is a rough sketch of the algorithm and its inner workings.

▶ Transformation

The transformation process involves three steps:

  1. Create a N x N matrix with rotations of data d, where N is the length of d
  2. Sort the rows of this matrix
  3. Output the last characters of each row + the final row index of d

The last characters and the row index is enough information to reproduce d.

Implementation

Most implementations of the Burrow-Wheeler Transform use Suffix Arrays for generating the transformations. A suffix appears often in concatenation with strings and substrings. A substring is a string appearing in the original string. While pp, ppl, or le are all a substring of apple; ale does not appear in apple. If the substring appears at the beginning of the word, it is called a prefix e.g. a, ap, or app. If it appears at the end of the word, it is called a suffix e.g. e, le, or ple. This definition of prefix and suffix are also being used in context of byte vectors.

Suffix Arrays are sorted matrices of all suffixes of the original data. Often they are represented by their row indices vector of the length of the original data. The following example shows the sorted suffix indices.

Example

apple$ > $, apple, e, le, ple, pple > [5, 0, 4, 3, 2, 1]

Some implementations of suffix arrays also include an end character often symbolized using $. This can, but must not be included for the Burrow Wheeler Transform. From this array the last chars and the index position need to be calculated.

The latter is simply the position of 0 in the suffix indices vector. In this it is at index position 1. The former is a bit trickier. What is needed is the last character in a rotated suffix list. The sorted suffix list is given above. For the BWT algorithm these words need to be filled with the actual word, to the length of the original data.

apple, e, le, ple, pple > apple, eappl, leapp, pleap, pleap, pplea > [e, l, p, p, a]

The kean reader might have observed that the last letter is always the one previous in the original word. Therefore the last characters of [5, 4, 4, 3, 2, 1] are [4, -1, 3, 2, 1, 0]. Since the special character $ is not important the final vector information is [e, l, p, p, a] and 1 for the index position.

◀ Reverse

TODO: Improve explanation The reverse algorithm of BWT uses three vectors of length N with N being the number of data vector. The first vector is the actual vector to be reversed i.e. last characters of the suffix array. The second vector is the first vector sorted. The third and last vector is a count of the second vector. The third vector saves the information on the position of the byte. The following is an example for the apple string as above.

[e, l, p, p, a] > [1, 1, 1, 2, 1]

[e, l, p, p, a] [a, e, l, p, p] [1, 1, 1, 2, 1] i: 0, The first output is the letter at the index position of the second vector. In the above example this is pos 1 in [$, a, e, l, p, p] and therefore a. The second output is at the index position of the nth occurence of the letter just output in the first vector. The nth occurence is described by the third vector. In this case it is the first occurence of a in vector 1. This is at the last position. Therefore the output is p. After that is the first occurence of the letter p in vector 1. This is at position 2.

Example

use rscompress_transformation::BurrowWheeler;

Implementations

impl BurrowWheeler[src]

pub fn new() -> Self[src]

pub fn reset(&mut self)[src]

pub fn with_ix_and_size(ix: usize, size: usize) -> Self[src]

Trait Implementations

impl Debug for BurrowWheeler[src]

fn fmt(&self, f: &mut Formatter<'_>) -> Result[src]

Formats the value using the given formatter. Read more

impl Default for BurrowWheeler[src]

fn default() -> Self[src]

Returns the “default value” for a type. Read more

impl Transform for BurrowWheeler[src]

fn transform(&mut self, source: &[u8]) -> Result<Vec<u8>, TransformError>[src]

Transformation of the initial source data

fn reverse(&mut self, source: &[u8]) -> Result<Vec<u8>, TransformError>[src]

Reversing the initial transformation

Auto Trait Implementations

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

pub fn type_id(&self) -> TypeId[src]

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

pub fn borrow(&self) -> &T[src]

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

pub fn borrow_mut(&mut self) -> &mut T[src]

Mutably borrows from an owned value. Read more

impl<T> From<T> for T[src]

pub fn from(t: T) -> T[src]

Performs the conversion.

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

pub fn into(self) -> U[src]

Performs the conversion.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>[src]

Performs the conversion.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>[src]

Performs the conversion.