Crate cshannon

Crate cshannon 

Source
Expand description

A library of some early compression algorithms based on replacement schemes.

⚠️ WARNING ⚠️

This is a pet-project and does not intend to be a production-ready library for data compression (e.g. no attempt is made to maintain at-rest data format compatibility across library versions).

This library implements the standard Huffman coding scheme, two precursors to the Huffman scheme often called Shannon-Fano coding, and a simple fixed-width encoding that is easiest to understand, though not very good at encoding information.

§Usage

cshannon provides a binary that can be used for compression / decompression at the command line and a library that can be integrated into other projects.

Run cshannon --help to see the command-line options for the binary.

The library exposes the same functionality via the run function:

use cshannon::{Args, Command, CompressArgs, EncodingScheme, TokenizationScheme, run};
use std::path::Path;

run(Args{
    command: Command::Compress(CompressArgs{
        tokenization_scheme: TokenizationScheme::Byte,
        encoding_scheme: EncodingScheme::Fano
    }),
    input_file: &Path::new("/path/to/input_file"),
    output_file: &Path::new("/path/to/output_file"),
});

This library uses the logging facade from the log create. You must setup an appropriate logger in your binary’s entry-point for the library to use. As an example, the primary command line binary in this package uses env_logger (see src/bin/cli.rs).

§Abstraction operation description

The abstract steps in compression are as follows:

Input --> Tokens --> Model --> Encoding -+
  |                                      |
  +-----> Tokens ------------------------+--> Compressed
                                                Output

The abstract steps for decompression are as follows:

Compressed --> extract prefix --> Encoding
  Input                              |
   |                                 |
   +--> remaining data --------------+--> Output

Decompression is conceptually simpler because there are no choices (of tokenizer and encoding). The encoding is included as a prefix in-band in the compressed data.

Structs§

Args
Arguments for the run entry-point of this library.
CompressArgs
Arguments specific to the compression operation.
DecompressArgs
Placeholder for (future) arguments specific to the decompression operation.

Enums§

Command
The command to invoke via the run entry-point.
EncodingScheme
Encoding schems (i.e. the compression algorithms) supported by this library.
TokenizationScheme
Source text needs to be split into tokens that are then compressed using one of the supported algorithms. This enum lists all the supported tokenization schemes.

Functions§

run
Invoke this library to compress or decompress data.