[−][src]Module httlib_huffman::encoder

Provides an implementation of the canonical Huffman encoder.

Encoding is relatively easy since we are replacing the individual characters with the Huffman code. We add an EOS sign at the end to always fill the entire octet.

The Huffman encoder implementation illustration:

[add "!"]     1111111000
[add "$"]     11111110001111111111001
[add "%"]     11111110001111111111001010101 (fix length)
[add "&"]     1111111000111111111100101010111111000
[add "A"]     1111111000111111111100101010111111000100001
[add EOS]     1111111000111111111100101010111111000100001111111111111111111111111111111
 
[result]      [254   ][63    ][242   ][175   ][196   ][63    ]
              111111100011111111110010101011111100010000111111

The illustration shows how the encoder iterates through all the [ASCII] characters and replaces them with the Huffman code. Each line ends with the EOS character which serves as (up to 7 bits) padding.

While adding the Hoffman code to the sequence, the length of the added code must exactly match the number of bits specified in the documentation. Working with Huffman codes in bytes and then converting them to other types, such as strings, could remove the prepended zeros. In such cases, we have to do some plumbing to ensure all bits are there (an example of this would be the character "%").

Implementation could be achieved by manipulating a string of ones and zeros. However, for more complex systems such as high-performance web servers, this would not be sustainable from the performance perspective. To manage resources accordingly, we require innovation so the investments are protected.

A replacement of the string with characters such as numbers, which are more appropriate for computers, and the use of bitwise operators gives a significant increase in performance. Before this can be done, we need to have an understanding of how the numbers are added. Although we are all aware of what "1+2=3" is, or what is a concatenation of a string such as "aa+bb=aabb", in bit operations, these rules are not quite so obvious. Let's see an example of the addition with bits directly:

       1 +        2 =        3
00000001 | 00000010 = 00000011

For the sum of two bit numbers, we used the bitwise operator OR denoted by the "|" symbol which serves as a sign for addition "+" in our example. Its rule is to trace the bits of both numbers and, if a 0 or a 1 is found on the same spot, change their value to 1, while setting the value to 0 in other cases. This understanding now enables us to re-implement the example above.

Instead of a string, we will use a u64 data type storing a string of 64 bits. We could also use a data type with a larger capacity (such as u128), but u64 is sufficient. The storage requirement is 32 bits, which is the maximum length of the individual Huffman code plus an extra byte (8) for the surplus cell, meaning that we need 40 bits of storage altogether.

The illustration below shows individual steps for encoding a string of characters as in the example above, while the encoding is carried out with the use of numbers and bitwise operators.

[add "!"]     111111100000000000000000000000000000000000000000
[add "$"]     11111110001111111111001000000000000000000000000000000000
[add "%"]     1111111000111111111100101010100000000000000000000000000000000000 (fix length)
[add "&"]               11111111110010101011111100000000000000000000000000000000000000
[add "A"]                     1111001010101111110001000010000000000000000000000000000000000000
[add EOS]                     1111001010101111110001000011111111111111111111111111111110000000
 
[result]      [254   ][63    ][242   ][175   ][196   ][63    ]
              111111100011111111110010101011111100010000111111

Although the illustration is quite similar to the previous one, it is much more colorful. It is also apparent that a string of bits is getting shorter on the left and longer on the right end.

When the Huffman code is added to the string for the individual character, the algorithm immediately ensures 32 free bit spaces where the next character will be added. This is achieved by the so-called shifting bits using the "<<" bitwise operator. Since we are dealing with bit numbers, we always rotate for 1 or more bytes, dependent on the required capacity, meaning for 8*N bits. It might not be obvious but it is interesting that, by rotating bits and adding the new Huffman character, we are adding numbers in the same way as we did in the simple example previously presented.

If looked at separately, the Huffman algorithm is quite simple. But when we don't intend to only implement it, but we are, instead, interested in the maximization of the performance and lowering of used resources, things get more complicated. The performance and quality of our solution are, therefore, comparable to the implementation in some well-known web servers.

[−][src]Module httlib_huffman::encoder

Modules

Enums

Functions