Crate minimum_redundancy

Source
Expand description

minimum_redundancy is the Rust library by Piotr Beling to encode and decode data with binary or non-binary minimum-redundancy (Huffman) coding.

The library is fast and consumes low memory both to construct (which is done without explicitly building a tree) and store the coding dictionary (it only stores frequency-sorted symbols and the numbers of non-leaf nodes at successive levels of the canonical Huffman tree). In addition, the implementation is generic. It can construct not only binary codes (obtained via binary Huffman trees), but of any arity (trees of any degree).

The high efficiency of minimum_redundancy is confirmed by benchmarks included in (please cite this paper if you are using minimum_redundancy for research purposes):

The library uses improved Huffman algorithm, with ideas from the following papers:

  • A. Brodnik, S. Carlsson, Sub-linear decoding of Huffman Codes Almost In-Place, 1998
  • A. Moffat, J. Katajainen, In-Place Calculation of Minimum-Redundancy Codes. In: Akl S.G., Dehne F., Sack JR., Santoro N. (eds) Algorithms and Data Structures. WADS 1995. Lecture Notes in Computer Science, vol 955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60220-8_79

§Example

use minimum_redundancy::{Coding, Code, DecodingResult, BitsPerFragment};
use maplit::hashmap;

// Construct coding with 1 bit per fragment for values 'a', 'b', 'c',
// whose frequencies of occurrence are 100, 50, 10 times, respectively.
let huffman = Coding::from_frequencies(BitsPerFragment(1), hashmap!('a' => 100u32, 'b' => 50, 'c' => 10));
// We expected the following Huffman tree:
//  /  \
// /\  a
// bc
// and the following code assignment: a -> 1, b -> 00, c -> 01
assert_eq!(huffman.codes_for_values(), hashmap!(
                'a' => Code{ content: 0b1, len: 1 },
                'b' => Code{ content: 0b00, len: 2 },
                'c' => Code{ content: 0b01, len: 2 }
               ));
// reverse codes encode the first levels of the tree on the least significant bits (e.g., c -> 10):
assert_eq!(huffman.reversed_codes_for_values(), hashmap!(
                'a' => Code{ content: 0b1, len: 1 },
                'b' => Code{ content: 0b00, len: 2 },
                'c' => Code{ content: 0b10, len: 2 }
               ));
let mut decoder_for_a = huffman.decoder();
assert_eq!(decoder_for_a.consume(1), DecodingResult::Value(&'a'));
let mut decoder_for_b = huffman.decoder();
assert_eq!(decoder_for_b.consume(0), DecodingResult::Incomplete);
assert_eq!(decoder_for_b.consume(0), DecodingResult::Value(&'b'));
let mut decoder_for_c = huffman.decoder();
assert_eq!(decoder_for_c.consume(0), DecodingResult::Incomplete);
assert_eq!(decoder_for_c.consume(1), DecodingResult::Value(&'c'));
assert_eq!(huffman.total_fragments_count(), 5); // 1+2+2
assert_eq!(huffman.values.as_ref(), ['a', 'b', 'c']); // sorted by frequencies

Structs§

BitsPerFragment
BitsPerFragment represents the Huffman’s tree degree that is the power of two. It represents number of bits needed to store the degree. It can be used to construct minimum-redundancy coding whose codeword lengths are a multiple of this number of bits. It is faster than Degree and should be preferred for degrees that are the powers of two.
Code
Represents a codeword.
CodeIterator
Iterator over the fragments of (unreversed) code.
CodesIterator
Iterator over value-codeword pairs.
Coding
Succinct representation of minimum-redundancy coding (huffman tree of some degree in the canonical form).
Decoder
Decoder that decodes a value for given code, consuming one codeword fragment (and going one level down the huffman tree) at a time.
Degree
Degree represents the degree of the Huffman tree. It is slower than BitsPerFragment and should be avoided when the degree is the power of two.
LevelIterator
Iterator over the levels of the huffman tree.
ReversedCodeIterator
Iterator over the fragments of reversed code.
ReversedCodesIterator
Iterator over value / reversed codeword pairs.

Enums§

DecodingResult
Result of fragment decoding returned be consume method of Decoder.
ValueSize
Points the number of bytes needed to store value of the type ValueType. This number of bytes can be constant or can depend on the value.

Traits§

Frequencies
Types that implement this trait can count number of occurrences of values.
TreeDegree
Represents the degree of the Huffman tree, which is equal to the number of different values of a single codeword fragment.

Functions§

entropy_to_bpf
Heuristically calculates bits per fragment that gives about constant length average code size. entropy should equals to entropy or a bit less, e.g. entropy minus 0.2