[][src]Module httlib_huffman::flatten

This module provides features for flattening Huffman tree and generating translation matrix.

HPACK documentation provides an already prepared and for the web optimized Huffman code for all ASCII characters. To implement the Huffman algorithm for HPACK we have to first flatten this table to a two-dimensional matrix.

Let’s learn how to do this on a very simple example. Our algorithm will enable the conversion of letters A, B, C, D, and E into a Huffman sequence. The Huffman code for each letter is shown in the table below.

CharacterHuffman code
A00
B01
C100
D101
E11

We have decided to flatten the Huffman table into a matrix, enabling the decoder to read Huffman bit-sequence 2-bits at a time. The illustration below shows the table structure we need to fill in.

PATHIDSYMLFT00011011
//0------

The first column PATH will serve for our notes in which we’ll store read bits so we will know what sequence refers to what table row. Reading of each character’s code always starts in the root row marked with //. The column ID will store the unique name of the row. The first row is marked with 0. The column SYM will store characters (e.g. A). Field LFT will store the information about the leftover bits. A leftover bit is a number of bits, missing to reach the full bit chunk (in our case 2 bits). For example, letter C and D have a leftover of 1, because to reach a round number of bits, which is in our case 2 bits * N, 1 bit remains. Letters A, B, and E have no leftover. The remaining columns represent the read chunk of 2 bits for all its possible values ranging from 00 (0) to 11 (3).

The table above will now be filled with data of sample Huffman coding. As mentioned previously, we are reading the Hoffman code 2-bits at a time. Let’s see how to insert data to the table for the first letter A.

Letter A is represented with code 00. Since there is no path //00 for this code in the first column, we create a new line with a new ID. There is no leftover, and in the root line to column 00 we write the ID of the newly established line. Since we read all the bits for the letter A, we also write character A in the SYM column.

PathIDSYMLFT00011011
//0--1---
//001A0----

We then repeat this process for the letter B. The letter B is represented with code 01. Since there is no path //01 for this code, we create a new line with a new ID. There is no leftover, and in the root line in column 01 we write the ID of the newly established line. Since we read all the bits for the letter B, we also write character B to the SYM column.

PathIDSYMLFT00011011
//0--12--
//001A0----
//012B0----

The process for the letter C is somewhat different since its number of bits doesn’t correspond to 2-bits * N. The final bit is therefore missing, so we claim that it has a leftover of 1. First, we read the first 2 bits and insert them in the table following the same process as before. After that, we read the remaining bit, while assuming that all the possible variations of the missing bit exist. This is marked with X. Since one bit is missing, we note this in the column LFT.

PathIDSYMLFT00011011
//0--123-
//001A0----
//012B0----
//102B0----
//103--44--
//100X4C1----

We repeat the process for letters D and E. The final table should look like this:

PathIDSYMLFT00011011
//0--1236
//001A0----
//012B0----
//102B0----
//103--4455
//100X4C1----
//101X5D1----
//116E0----

Note that it would be correct to replace the variants marked with X with actual possible paths.

PathIDSYMLFT00011011
//0--1236
//001A0----
//012B0----
//102B0----
//103--4455
//10004C1----
//10014C1----
//10105D1----
//10115D1----
//116E0----

The flattened form of the Huffman tree] in the form of a matrix plays a crucial role in the process of decoding. We now have an idea of what the process of decoding looks like, using this matrix. This module uses this exact technic for creating N-bits translation matrix.

Functions

flatten

Generates a translation matrix that can be used to decode an encoded content. The function expects the speed attribute which represents the number of bits that the decoder will read at a time when processing bytes. The speed attribute can be between 1 and 5 bits. The higher number will have a positive effect on performance but possibly a higher memory usage.