[−][src]Crate cshannon
A library of some early compression algorithms based on replacement schemes.
This library implements the standard Huffman coding scheme and two precursors to the Huffman scheme often called Shannon-Fano coding.
Usage
cshannon provides a binary that can be used for compression / decompression at the command line and a library that can be integrated into other projects.
Run cshannon --help
to see the command-line options for the binary.
The easiest way to use cshannon library is:
use cshannon::{Args, run}; run(Args{ command: "compress", input_file: "/path/to/input_file", output_file: "/path/to/output_file", tokenizer: "byte", encoding: "fano", });
Crate layout
The abstract steps in compression are as follows:
Input --> Tokens --> Model --> Encoding -+
| |
+-----> Tokens ------------------------+--> Compressed
Output
Different modules in the crate correspond to each of these steps.
- The tokens module provides traits for tokenizing text. Three concrete tokenization schemes are implemented: tokens::bytes, tokens::graphemes and tokens::words.
- The model module provides a way to compute a zeroeth order model from a stream of tokens.
- The encoding module provides traits for creating an encoding scheme from a model. Four concrete encoding schemes are implemented: encoding::balanced_tree, encoding::shannon, encoding::fano and encoding::huffman.
- Finally, the code module provides methods to encode a token stream given an encoding. The encoding itself is also included in the compressed output.
The abstract steps for decompression are as follows:
Compressed --> extract prefix --> Encoding
Input |
| |
+--> remaining data --------------+--> Output
Decompression is conceptually simpler because there are no choices (of tokenizer and encoding). The encoding is included as a prefix in-band in the compressed data. Most of the decompression logic resides in the code module.
Modules
code | Provides facilities to read & write data encoded with a prefix code. |
encoding | |
model | Exports |
tokens | This module provides traits for tokenizing text. |
Structs
Args |
Functions
compress | Document me.
TODO: Convert to use AsRef |
decompress | Document me.
TODO: Convert to use AsRef |
run |