[−][src]Crate cshannon

A library of some early compression algorithms based on replacement schemes.

This library implements the standard Huffman coding scheme and two precursors to the Huffman scheme often called Shannon-Fano coding.

Usage

cshannon provides a binary that can be used for compression / decompression at the command line and a library that can be integrated into other projects.

Run cshannon --help to see the command-line options for the binary.

The easiest way to use cshannon library is:

use cshannon::{Args, run};

run(Args{
    command: "compress",
    input_file: "/path/to/input_file",
    output_file: "/path/to/output_file",
    tokenizer: "byte",
    encoding: "fano",
});

Crate layout

The abstract steps in compression are as follows:

Input --> Tokens --> Model --> Encoding -+
  |                                      |
  +-----> Tokens ------------------------+--> Compressed
                                                Output

Different modules in the crate correspond to each of these steps.

The tokens module provides traits for tokenizing text. Three concrete tokenization schemes are implemented: tokens::bytes, tokens::graphemes and tokens::words.
The model module provides a way to compute a zeroeth order model from a stream of tokens.
The encoding module provides traits for creating an encoding scheme from a model. Four concrete encoding schemes are implemented: encoding::balanced_tree, encoding::shannon, encoding::fano and encoding::huffman.
Finally, the code module provides methods to encode a token stream given an encoding. The encoding itself is also included in the compressed output.

The abstract steps for decompression are as follows:

Compressed --> extract prefix --> Encoding
  Input                              |
   |                                 |
   +--> remaining data --------------+--> Output

Decompression is conceptually simpler because there are no choices (of tokenizer and encoding). The encoding is included as a prefix in-band in the compressed data. Most of the decompression logic resides in the code module.

Modules

code	Provides facilities to read & write data encoded with a prefix code.
encoding	Defines the `Encoding` struct that maps a `Token` to a `Letter`.
model	Exports `Model`, a statically computed zero order model over a `Token` stream.
tokens	This module provides traits for tokenizing text.

Structs

Args

Functions

compress	Document me. TODO: Convert to use AsRef
decompress	Document me. TODO: Convert to use AsRef
run