httlib_huffman/flattener/
mod.rs

1//! Provides features for flattening Huffman tree and generating translation
2//! matrix.
3//! 
4//! [HPACK] documentation provides an already prepared and for the web optimized
5//! Huffman code for all [ASCII] characters. To implement the Huffman algorithm
6//! for [HPACK] we have to first flatten this table to a two-dimensional matrix.
7//! 
8//! Let’s learn how to do this on a very simple example. Our algorithm will
9//! enable the conversion of letters A, B, C, D, and E into a Huffman sequence.
10//! The Huffman code for each letter is shown in the table below.
11//! 
12//! | Character | Huffman code
13//! |-|-
14//! | A | 00
15//! | B | 01
16//! | C | 100
17//! | D | 101
18//! | E | 11
19//! 
20//! We have decided to flatten the Huffman table into a matrix, enabling the
21//! decoder to read Huffman bit-sequence 2-bits at a time. The illustration
22//! below shows the table structure we need to fill in. 
23//! 
24//! | PATH | ID | SYM | LFT | 00 | 01 | 10 | 11
25//! |-|-|-|-|-|-|-|-
26//! | // | 0 | - | - | - | - | - | -
27//! 
28//! The first column PATH will serve for our notes in which we’ll store read
29//! bits so we will know what sequence refers to what table row. Reading of each
30//! character’s code always starts in the root row marked with `//`. The column
31//! `ID` will store the unique name of the row. The first row is marked with
32//! `0`. The column `SYM` will store characters (e.g. A). Field `LFT` will store
33//! the information about the leftover bits. A leftover bit is a number of bits,
34//! missing to reach the full bit chunk (in our case 2 bits). For example,
35//! letter C and D have a leftover of 1, because to reach a round number of
36//! bits, which is in our case 2 bits * N, 1 bit remains. Letters A, B, and E
37//! have no leftover. The remaining columns represent the read chunk of 2 bits
38//! for all its possible values ranging from `00` (0) to `11` (3). 
39//! 
40//! The table above will now be filled with data of sample Huffman coding. As
41//! mentioned previously, we are reading the Hoffman code 2-bits at a time.
42//! Let’s see how to insert data to the table for the first letter A.  
43//! 
44//! Letter A is represented with code `00`. Since there is no path `//00` for
45//! this code in the first column, we create a new line with a new `ID`. There
46//! is no leftover, and in the root line to column `00` we write the `ID` of the
47//! newly established line. Since we read all the bits for the letter A, we also
48//! write character A in the `SYM` column. 
49//! 
50//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
51//! |-|-|-|-|-|-|-|-
52//! | // | 0 | - | - | 1 | - | - | -
53//! | //00 | 1 | A | 0 | - | - | - | -
54//! 
55//! We then repeat this process for the letter B. The letter B is represented
56//! with code `01`. Since there is no path `//01` for this code, we create a
57//! new line with a new `ID`. There is no leftover, and in the root line in
58//! column `01` we write the `ID` of the newly established line. Since we read
59//! all the bits for the letter B, we also write character B to the `SYM`
60//! column.
61//! 
62//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
63//! |-|-|-|-|-|-|-|-
64//! | // | 0 | - | - | 1 | 2 | - | -
65//! | //00 | 1 | A | 0 | - | - | - | -
66//! | //01 | 2 | B | 0 | - | - | - | -
67//! 
68//! The process for the letter C is somewhat different since its number of bits
69//! doesn’t correspond to 2-bits * N. The final bit is therefore missing, so we
70//! claim that it has a leftover of 1. First, we read the first 2 bits and
71//! insert them in the table following the same process as before. After that,
72//! we read the remaining bit, while assuming that all the possible variations
73//! of the missing bit exist. This is marked with `X`. Since one bit is missing,
74//! we note this in the column `LFT`. 
75//! 
76//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
77//! |-|-|-|-|-|-|-|-
78//! | // | 0 | - | - | 1 | 2 | 3 | -
79//! | //00 | 1 | A | 0 | - | - | - | -
80//! | //01 | 2 | B | 0 | - | - | - | -
81//! | //10 | 2 | B | 0 | - | - | - | -
82//! | //10 | 3 | - | - | 4 | 4 | - | -
83//! | //100X | 4 | C | 1 | - | - | - | -
84//! 
85//! We repeat the process for letters D and E. The final table should look like
86//! this: 
87//! 
88//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
89//! |-|-|-|-|-|-|-|-
90//! | // | 0 | - | - | 1 | 2 | 3 | 6
91//! | //00 | 1 | A | 0 | - | - | - | -
92//! | //01 | 2 | B | 0 | - | - | - | -
93//! | //10 | 2 | B | 0 | - | - | - | -
94//! | //10 | 3 | - | - | 4 | 4 | 5 | 5
95//! | //100X | 4 | C | 1 | - | - | - | -
96//! | //101X | 5 | D | 1 | - | - | - | -
97//! | //11 | 6 | E | 0 | - | - | - | -
98//! 
99//! Note that it would be correct to replace the variants marked with X with
100//! actual possible paths.
101//! 
102//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
103//! |-|-|-|-|-|-|-|-
104//! | // | 0 | - | - | 1 | 2 | 3 | 6
105//! | //00 | 1 | A | 0 | - | - | - | -
106//! | //01 | 2 | B | 0 | - | - | - | -
107//! | //10 | 2 | B | 0 | - | - | - | -
108//! | //10 | 3 | - | - | 4 | 4 | 5 | 5
109//! | //1000 | 4 | C | 1 | - | - | - | -
110//! | //1001 | 4 | C | 1 | - | - | - | -
111//! | //1010 | 5 | D | 1 | - | - | - | -
112//! | //1011 | 5 | D | 1 | - | - | - | -
113//! | //11 | 6 | E | 0 | - | - | - | -
114//! 
115//! The flattened form of the Huffman tree] in the form of a matrix plays a
116//! crucial role in the process of decoding. We now have an idea of what the
117//! process of decoding looks like, using this matrix. This module uses this
118//! exact technic for creating N-bits translation matrix. 
119//! 
120//! [HPACK]: https://tools.ietf.org/html/rfc7541
121//! [ASCII]: https://en.wikipedia.org/wiki/ASCII
122
123use crate::DecoderSpeed;
124
125/// Generates a translation matrix that can be used to decode an encoded
126/// content. The function expects the `speed` attribute which represents the
127/// number of bits that the decoder will read at a time when processing bytes.
128/// The speed attribute can be between 1 and 5 bits. The higher number will
129/// have a positive effect on performance but possibly a higher memory usage.
130/// 
131/// **Example:**
132/// 
133/// ```rust
134/// use httlib_huffman::DecoderSpeed;
135/// use httlib_huffman::flattener::flatten;
136/// use httlib_huffman::encoder::table::ENCODE_TABLE;
137/// 
138/// let speed = DecoderSpeed::FourBits; // decoder will read 4 bits at a time
139/// let table = flatten(&ENCODE_TABLE, speed);
140/// ```
141pub fn flatten(
142    codings: &[(u8, u32)],
143    speed: DecoderSpeed,
144) -> Vec<Vec<(Option<u8>, Option<u16>, u8)>> { // next_id, ascii, leftover
145    let speed = speed as usize;
146    let blank_transition = generate_blank_transition(speed);
147
148    let mut table: Vec<Vec<(Option<u8>, Option<u16>, u8)>> = Vec::new();
149    table.push(blank_transition.clone());
150
151    for (ascii, coding) in codings.iter().enumerate() {
152        let leftover = (coding.0 as f32 / speed as f32).ceil() as usize * speed - coding.0 as usize;
153        let mut id = 0; // current walk index in table
154        
155        for (path_index, keys) in generate_coding_paths(coding, speed).iter().enumerate() {
156
157            if path_index == 0 { // create IDs for original path
158                for key_index in 0..keys.len() - 1 { // the last key will be handled afterward
159                    let key = keys[key_index];
160                    let target = &table[id][key]; // should always exist
161    
162                    let next_id = if let Some(next_id) = target.0 {
163                        next_id
164                    } else {
165                        table.push(blank_transition.clone());
166
167                        let next_id = (table.len() - 1) as u8;
168                        let transitions = table.get_mut(id).unwrap();
169                        let target = transitions.get_mut(key).unwrap();
170                        target.0 = Some(next_id);
171
172                        next_id
173                    };
174    
175                    id = next_id as usize;
176                }
177            } 
178            
179            let key = keys.last().unwrap(); // handle the last key of all path variants
180            let transitions = table.get_mut(id).unwrap();
181            let target = transitions.get_mut(*key).unwrap();
182            target.1 = Some(ascii as u16);
183            target.2 = leftover as u8;
184        }
185    }
186
187    table
188}
189
190/// Generates a black transition object based on the provided speed attribute.
191fn generate_blank_transition(speed: usize) -> Vec<(Option<u8>, Option<u16>, u8)> {
192    let mut transition = Vec::new();
193
194    for _ in 0..2u32.pow(speed as u32) {
195        transition.push((None, None, 0));
196    }
197
198    transition
199}
200
201/// Generates all key paths for the particular coding. If the key has leftovers
202/// the function will fill that gap with all possible variants.
203/// 
204/// ```tst
205/// Code:       [1100, 0000, 1110, 011X]
206/// Variants:   [1100, 0000, 1110, 0110]
207///             [1100, 0000, 1110, 0111]
208/// ```
209fn generate_coding_paths(coding: &(u8, u32), speed: usize) -> Vec<Vec<usize>> {
210    let mut bits: u32 = 0; // HPACK value can be up to 32 bits
211    let chunks_len = (coding.0 as f32 / speed as f32).ceil() as usize;
212    let chunk_max = 2u32.pow(speed as u32) as usize - 1;
213    let leftover = chunks_len * speed - coding.0 as usize;
214    bits |= coding.1;
215    bits <<= leftover;
216
217    let mut chunks = vec![];
218    for i in 0..chunks_len {
219        let mut chunk = (bits >> (chunks_len - i - 1) * speed) as usize;
220        chunk &= chunk_max;
221        chunks.push(chunk as usize);
222    }
223    
224    let mut variants: Vec<Vec<usize>> = vec![];
225    let chunk_last = chunks.pop().unwrap();
226    let variants_len = 2u32.pow(leftover as u32) as usize;
227    for i in 0..variants_len {
228        let mut chunks = chunks.clone();
229        chunks.push(chunk_last + i);
230        variants.push(chunks);
231    }
232
233    variants
234}
235
236#[cfg(test)]
237mod test {
238    use super::*;
239
240    fn sample_encoding_table() -> &'static [(u8, u32)] {
241        &[
242            (13, 0x1ff8),
243            (23, 0x7fffd8),
244            (28, 0xfffffe2),
245            (28, 0xfffffe3),
246            (28, 0xfffffe4),
247            (28, 0xfffffe5),
248            (28, 0xfffffe6),
249            (28, 0xfffffe7),
250            (28, 0xfffffe8),
251        ]
252    }
253    
254    /// Should generate a translation matrix that allows for decoding Huffman
255    /// sequence by reading 1 bit at a time.
256    #[test]
257    fn flattens_1bits() { 
258        let table = flatten(&sample_encoding_table(), DecoderSpeed::OneBit);
259        assert_eq!(table.len(), 41);
260
261        let target = &table[2][1];
262        assert_eq!(target.0, Some(3));
263        assert_eq!(target.1, None);
264        assert_eq!(target.2, 0);
265    }
266
267    /// Should generate a translation matrix that allows for decoding Huffman
268    /// sequence by reading 2 bits at a time.
269    #[test]
270    fn flattens_2bits() { 
271        let table = flatten(&sample_encoding_table(), DecoderSpeed::TwoBits);
272        assert_eq!(table.len(), 20);
273
274        let target = &table[1][3];
275        assert_eq!(target.0, Some(2));
276        assert_eq!(target.1, None);
277        assert_eq!(target.2, 0);
278    }
279
280    /// Should generate a translation matrix that allows for decoding Huffman
281    /// sequence by reading 3 bits at a time.
282    #[test]
283    fn flattens_3bits() { 
284        let table = flatten(&sample_encoding_table(), DecoderSpeed::ThreeBits);
285        assert_eq!(table.len(), 16);
286
287        let target = &table[1][7];
288        assert_eq!(target.0, Some(2));
289        assert_eq!(target.1, None);
290        assert_eq!(target.2, 0);
291    }
292
293    /// Should generate a translation matrix that allows for decoding Huffman
294    /// sequence by reading 4 bits at a time.
295    #[test]
296    fn flattens_4bits() { 
297        let table = flatten(&sample_encoding_table(), DecoderSpeed::FourBits);
298        assert_eq!(table.len(), 9);
299
300        let target = &table[1][15];
301        assert_eq!(target.0, Some(2));
302        assert_eq!(target.1, None);
303        assert_eq!(target.2, 0);
304    }
305
306    /// Should generate a translation matrix that allows for decoding Huffman
307    /// sequence by reading 5 bits at a time.
308    #[test]
309    fn flattens_5bits() { 
310        let table = flatten(&sample_encoding_table(), DecoderSpeed::FiveBits);
311        assert_eq!(table.len(), 8);
312
313        let target = &table[1][31];
314        assert_eq!(target.0, Some(2));
315        assert_eq!(target.1, None);
316        assert_eq!(target.2, 0);
317    }
318
319    /// Should generate all key paths variants for the codings with leftover.
320    #[test]
321    fn generates_coding_paths() {
322        assert_eq!(generate_coding_paths(&(14, 12345), 4), vec![ // code=11000000|111001XX, len=14
323            vec![12, 0, 14, 4], // [1100, 0000, 1110, 0100]
324            vec![12, 0, 14, 5], // [1100, 0000, 1110, 0101]
325            vec![12, 0, 14, 6], // [1100, 0000, 1110, 0110]
326            vec![12, 0, 14, 7], // [1100, 0000, 1110, 0111]
327        ]);
328        assert_eq!(generate_coding_paths(&(13, 2616), 4), vec![ // code=01010001|11000XXX, let=13
329            vec![5, 1, 12, 0], // [0101, 0000, 1110, 0000]
330            vec![5, 1, 12, 1], // [0101, 0000, 1110, 0001]
331            vec![5, 1, 12, 2], // [0101, 0000, 1110, 0010]
332            vec![5, 1, 12, 3], // [0101, 0000, 1110, 0011]
333            vec![5, 1, 12, 4], // [0101, 0000, 1110, 0100]
334            vec![5, 1, 12, 5], // [0101, 0000, 1110, 0101]
335            vec![5, 1, 12, 6], // [0101, 0000, 1110, 0110]
336            vec![5, 1, 12, 7], // [0101, 0000, 1110, 0111]
337        ]);
338    }
339}