Module compress::bwt [] [src]

BWT (Burrows-Wheeler Transform) forward and backward transformation. Requires bwt feature, enabled by default

This module contains a bruteforce implementation of BWT encoding in Rust as well as standard decoding. These are exposed as a standard Reader and Writer interfaces wrapping an underlying stream.

BWT output stream places together symbols with similar leading contexts. This reshaping of the entropy allows further stages to deal with repeated sequences of symbols for better compression.

Typical compression schemes are: BWT + RLE (+ EC) RLE + BWT + MTF + RLE + EC : bzip2 BWT + DC + EC : ybs

Where the stage families are: BWT: BWT (Burrows-Wheeler Transform), ST (Shindler transform) RLE: RLE (Run-Length Encoding) MTF: MTF (Move-To-Front), WFC (Weighted Frequency Coding) DC: DC (Distance Coding), IF (Inverse Frequencies) EC (Entropy Coder): Huffman, Arithmetic, RC (Range Coder)

Example

use std::io::{BufWriter, BufReader, Read, Write};
use compress::bwt;

// Encode some text
let text = "some text";
let mut e = bwt::Encoder::new(BufWriter::new(Vec::new()), 4 << 20);
e.write(text.as_bytes()).unwrap();
let (encoded, _) = e.finish();
let inner = encoded.into_inner().unwrap();

// Decode the encoded text
let mut d = bwt::Decoder::new(BufReader::new(&inner[..]), true);
let mut decoded = Vec::new();
d.read_to_end(&mut decoded).unwrap();

assert_eq!(&decoded[..], text.as_bytes());

Credit

This is an original (mostly trivial) implementation.

Modules

dc

DC (Distance Coding) forward and backward transformation. Designed to be used on BWT block output for compression.

mtf

MTF (Move To Front) encoder/decoder Produces a rank for each input character based on when it was seen last time. Useful for BWT output encoding, which produces a lot of zeroes and low ranks.

Structs

Decoder

This structure is used to decode a stream of BWT blocks. This wraps an internal reader which is read from when this decoder's read method is called.

Encoder

This structure is used to compress a stream of bytes using the BWT. This is a wrapper around an internal writer which bytes will be written to.

InverseIterator

An iterator over inverse BWT Run time: O(N), memory: N words (table)

Radix

Radix sorting primitive

TransformIterator

An iterator over BWT output

Constants

ALPHABET_SIZE

Functions

compute_inversion_table

Compute an inversion jump table, needed for BWT decoding

compute_suffixes

Compute a suffix array from a given input string Resulting suffixes are guaranteed to be alphabetically sorted Run time: O(N3), memory: N words (suf_array) + ALPHABET_SIZE words (Radix)

decode

Decode a BWT block, given it's origin, and using 'table' temporarily

decode_simple

A simplified BWT decode function, which allocates a temporary suffix array

encode

Encode BWT of a given input, using the 'suf_array'

encode_simple

Transform an input block into the output slice, all-inclusive version. Returns the index of the original string in the output matrix.

Type Definitions

Symbol

A base element for the transformation