Expand description
A library of some early compression algorithms based on replacement schemes.
⚠️ WARNING ⚠️
This is a pet-project and does not intend to be a production-ready library for data compression (e.g. no attempt is made to maintain at-rest data format compatibility across library versions).
This library implements the standard Huffman coding scheme, two precursors to the Huffman scheme often called Shannon-Fano coding, and a simple fixed-width encoding that is easiest to understand, though not very good at encoding information.
§Usage
cshannon provides a binary that can be used for compression / decompression at the command line and a library that can be integrated into other projects.
Run cshannon --help to see the command-line options for the binary.
The library exposes the same functionality via the run function:
use cshannon::{Args, Command, CompressArgs, EncodingScheme, TokenizationScheme, run};
use std::path::Path;
run(Args{
command: Command::Compress(CompressArgs{
tokenization_scheme: TokenizationScheme::Byte,
encoding_scheme: EncodingScheme::Fano
}),
input_file: &Path::new("/path/to/input_file"),
output_file: &Path::new("/path/to/output_file"),
});This library uses the logging facade from the log create.
You must setup an appropriate logger in your binary’s entry-point for the
library to use. As an example, the primary command line binary in this
package uses env_logger (see src/bin/cli.rs).
§Abstraction operation description
The abstract steps in compression are as follows:
Input --> Tokens --> Model --> Encoding -+
| |
+-----> Tokens ------------------------+--> Compressed
OutputThe abstract steps for decompression are as follows:
Compressed --> extract prefix --> Encoding
Input |
| |
+--> remaining data --------------+--> OutputDecompression is conceptually simpler because there are no choices (of tokenizer and encoding). The encoding is included as a prefix in-band in the compressed data.
Structs§
- Args
- Arguments for the
runentry-point of this library. - Compress
Args - Arguments specific to the compression operation.
- Decompress
Args - Placeholder for (future) arguments specific to the decompression operation.
Enums§
- Command
- The command to invoke via the
runentry-point. - Encoding
Scheme - Encoding schems (i.e. the compression algorithms) supported by this library.
- Tokenization
Scheme - Source text needs to be split into tokens that are then compressed using one of the supported algorithms. This enum lists all the supported tokenization schemes.
Functions§
- run
- Invoke this library to compress or decompress data.