[−][src]Module httlib_huffman::decoder
Provides an implementation of the canonical Huffman decoder.
When an entity receives a header, for which it determines that it contains content encoded with the canonical Huffman algorithm, it has to decode this content in the shortest possible time with as few resources as possible. The execution speed of this “simple“ task will contribute significantly to the response time, and this time must be as short as possible.
Reading and decoding a sequence bit by bit appears to be inadequate
performance-wise. Reading in buffered chunks outperforms reading bit by bit.
The trick of fast Huffman decoding is reading N-bits at a time. Since we
cannot determine how the seemingly “random” Huffman sequence corresponds to
actual data, we use flattened Huffman tree matrix that enables decoding of
N-bits at a time. This matrix is created by the flatten
module.
Let's learn the decoded mechanics based on a simple matrix that allowsfor reading 2 bits at a time.
Path | ID | SYM | LFT | 00 | 01 | 10 | 11 |
---|---|---|---|---|---|---|---|
// | 0 | - | - | 1 | 2 | 3 | 6 |
//00 | 1 | A | 0 | - | - | - | - |
//01 | 2 | B | 0 | - | - | - | - |
//10 | 2 | B | 0 | - | - | - | - |
//10 | 3 | - | - | 4 | 4 | 5 | 5 |
//100X | 4 | C | 1 | - | - | - | - |
//101X | 5 | D | 1 | - | - | - | - |
//11 | 6 | E | 0 | - | - | - | - |
We'll be using a sequence of characters: A, D, and B.
ADE = 0010101
The Huffman sequence will be decoded by reading 2 bits at a time. Every
reading begins at the root symbol //
. First, we read the first two bits
00
. In line one of the matrix at ID=0
, we need to check where this code
leads, or if it corresponds to any of the characters. Read bits lead to the
second line with ID=1
and they represent the letter A.
The process is repeated for the next 2 bits 10
. This code leads us to the
line with ID=3
which doesn’t represent a character, so we continue the
process for the next 2 bits 10
. This code then leads us to line 5,
representing the letter D. Here we can see that the value of the column
LFT=1
, meaning that there is a leftover 1. This means that in order to
continue reading bits we have to shift to one bit back and continue the
process there.
We position ourselves back to the root position while keeping the last bit
0
, and keep reading until we reach the sum of 2 bits. This means that we
need to read only 1 bit 1
. Code 01
corresponds with character B and with
this we conclude the decoding process.
00XXXXX => A
XX10XXX => continue
XXXX10X => D
XXXXX01 => B
With the use of the translation matrix we successfully decoded the Huffman sequence back into readable characters. This is how the decoder decodes literal values. The process is optimal, while it is best for web servers to read more bits at a time. Considering that the shortest Huffman code for an individual character is 5 bits long, it’s optimal, for the best ratio between speed and used resources, to read 4 bits at a time. More bits at a time mean faster decoding but at the same time a larger translation table and with it a higher memory footprint.
Modules
table1 | |
table2 | |
table3 | |
table4 | |
table5 |
Enums
DecoderError | Contains error options that can be encountered while performing the decoding operations. |
DecoderSpeed | Provides available decoding speed options which represent the number of bits that the decoder can read at a time. |
Functions
decode | Decodes Huffman's |