httlib_huffman/flattener/mod.rs
1//! Provides features for flattening Huffman tree and generating translation
2//! matrix.
3//!
4//! [HPACK] documentation provides an already prepared and for the web optimized
5//! Huffman code for all [ASCII] characters. To implement the Huffman algorithm
6//! for [HPACK] we have to first flatten this table to a two-dimensional matrix.
7//!
8//! Let’s learn how to do this on a very simple example. Our algorithm will
9//! enable the conversion of letters A, B, C, D, and E into a Huffman sequence.
10//! The Huffman code for each letter is shown in the table below.
11//!
12//! | Character | Huffman code
13//! |-|-
14//! | A | 00
15//! | B | 01
16//! | C | 100
17//! | D | 101
18//! | E | 11
19//!
20//! We have decided to flatten the Huffman table into a matrix, enabling the
21//! decoder to read Huffman bit-sequence 2-bits at a time. The illustration
22//! below shows the table structure we need to fill in.
23//!
24//! | PATH | ID | SYM | LFT | 00 | 01 | 10 | 11
25//! |-|-|-|-|-|-|-|-
26//! | // | 0 | - | - | - | - | - | -
27//!
28//! The first column PATH will serve for our notes in which we’ll store read
29//! bits so we will know what sequence refers to what table row. Reading of each
30//! character’s code always starts in the root row marked with `//`. The column
31//! `ID` will store the unique name of the row. The first row is marked with
32//! `0`. The column `SYM` will store characters (e.g. A). Field `LFT` will store
33//! the information about the leftover bits. A leftover bit is a number of bits,
34//! missing to reach the full bit chunk (in our case 2 bits). For example,
35//! letter C and D have a leftover of 1, because to reach a round number of
36//! bits, which is in our case 2 bits * N, 1 bit remains. Letters A, B, and E
37//! have no leftover. The remaining columns represent the read chunk of 2 bits
38//! for all its possible values ranging from `00` (0) to `11` (3).
39//!
40//! The table above will now be filled with data of sample Huffman coding. As
41//! mentioned previously, we are reading the Hoffman code 2-bits at a time.
42//! Let’s see how to insert data to the table for the first letter A.
43//!
44//! Letter A is represented with code `00`. Since there is no path `//00` for
45//! this code in the first column, we create a new line with a new `ID`. There
46//! is no leftover, and in the root line to column `00` we write the `ID` of the
47//! newly established line. Since we read all the bits for the letter A, we also
48//! write character A in the `SYM` column.
49//!
50//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
51//! |-|-|-|-|-|-|-|-
52//! | // | 0 | - | - | 1 | - | - | -
53//! | //00 | 1 | A | 0 | - | - | - | -
54//!
55//! We then repeat this process for the letter B. The letter B is represented
56//! with code `01`. Since there is no path `//01` for this code, we create a
57//! new line with a new `ID`. There is no leftover, and in the root line in
58//! column `01` we write the `ID` of the newly established line. Since we read
59//! all the bits for the letter B, we also write character B to the `SYM`
60//! column.
61//!
62//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
63//! |-|-|-|-|-|-|-|-
64//! | // | 0 | - | - | 1 | 2 | - | -
65//! | //00 | 1 | A | 0 | - | - | - | -
66//! | //01 | 2 | B | 0 | - | - | - | -
67//!
68//! The process for the letter C is somewhat different since its number of bits
69//! doesn’t correspond to 2-bits * N. The final bit is therefore missing, so we
70//! claim that it has a leftover of 1. First, we read the first 2 bits and
71//! insert them in the table following the same process as before. After that,
72//! we read the remaining bit, while assuming that all the possible variations
73//! of the missing bit exist. This is marked with `X`. Since one bit is missing,
74//! we note this in the column `LFT`.
75//!
76//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
77//! |-|-|-|-|-|-|-|-
78//! | // | 0 | - | - | 1 | 2 | 3 | -
79//! | //00 | 1 | A | 0 | - | - | - | -
80//! | //01 | 2 | B | 0 | - | - | - | -
81//! | //10 | 2 | B | 0 | - | - | - | -
82//! | //10 | 3 | - | - | 4 | 4 | - | -
83//! | //100X | 4 | C | 1 | - | - | - | -
84//!
85//! We repeat the process for letters D and E. The final table should look like
86//! this:
87//!
88//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
89//! |-|-|-|-|-|-|-|-
90//! | // | 0 | - | - | 1 | 2 | 3 | 6
91//! | //00 | 1 | A | 0 | - | - | - | -
92//! | //01 | 2 | B | 0 | - | - | - | -
93//! | //10 | 2 | B | 0 | - | - | - | -
94//! | //10 | 3 | - | - | 4 | 4 | 5 | 5
95//! | //100X | 4 | C | 1 | - | - | - | -
96//! | //101X | 5 | D | 1 | - | - | - | -
97//! | //11 | 6 | E | 0 | - | - | - | -
98//!
99//! Note that it would be correct to replace the variants marked with X with
100//! actual possible paths.
101//!
102//! | Path | ID | SYM | LFT | 00 | 01 | 10 | 11
103//! |-|-|-|-|-|-|-|-
104//! | // | 0 | - | - | 1 | 2 | 3 | 6
105//! | //00 | 1 | A | 0 | - | - | - | -
106//! | //01 | 2 | B | 0 | - | - | - | -
107//! | //10 | 2 | B | 0 | - | - | - | -
108//! | //10 | 3 | - | - | 4 | 4 | 5 | 5
109//! | //1000 | 4 | C | 1 | - | - | - | -
110//! | //1001 | 4 | C | 1 | - | - | - | -
111//! | //1010 | 5 | D | 1 | - | - | - | -
112//! | //1011 | 5 | D | 1 | - | - | - | -
113//! | //11 | 6 | E | 0 | - | - | - | -
114//!
115//! The flattened form of the Huffman tree] in the form of a matrix plays a
116//! crucial role in the process of decoding. We now have an idea of what the
117//! process of decoding looks like, using this matrix. This module uses this
118//! exact technic for creating N-bits translation matrix.
119//!
120//! [HPACK]: https://tools.ietf.org/html/rfc7541
121//! [ASCII]: https://en.wikipedia.org/wiki/ASCII
122
123use crate::DecoderSpeed;
124
125/// Generates a translation matrix that can be used to decode an encoded
126/// content. The function expects the `speed` attribute which represents the
127/// number of bits that the decoder will read at a time when processing bytes.
128/// The speed attribute can be between 1 and 5 bits. The higher number will
129/// have a positive effect on performance but possibly a higher memory usage.
130///
131/// **Example:**
132///
133/// ```rust
134/// use httlib_huffman::DecoderSpeed;
135/// use httlib_huffman::flattener::flatten;
136/// use httlib_huffman::encoder::table::ENCODE_TABLE;
137///
138/// let speed = DecoderSpeed::FourBits; // decoder will read 4 bits at a time
139/// let table = flatten(&ENCODE_TABLE, speed);
140/// ```
141pub fn flatten(
142 codings: &[(u8, u32)],
143 speed: DecoderSpeed,
144) -> Vec<Vec<(Option<u8>, Option<u16>, u8)>> { // next_id, ascii, leftover
145 let speed = speed as usize;
146 let blank_transition = generate_blank_transition(speed);
147
148 let mut table: Vec<Vec<(Option<u8>, Option<u16>, u8)>> = Vec::new();
149 table.push(blank_transition.clone());
150
151 for (ascii, coding) in codings.iter().enumerate() {
152 let leftover = (coding.0 as f32 / speed as f32).ceil() as usize * speed - coding.0 as usize;
153 let mut id = 0; // current walk index in table
154
155 for (path_index, keys) in generate_coding_paths(coding, speed).iter().enumerate() {
156
157 if path_index == 0 { // create IDs for original path
158 for key_index in 0..keys.len() - 1 { // the last key will be handled afterward
159 let key = keys[key_index];
160 let target = &table[id][key]; // should always exist
161
162 let next_id = if let Some(next_id) = target.0 {
163 next_id
164 } else {
165 table.push(blank_transition.clone());
166
167 let next_id = (table.len() - 1) as u8;
168 let transitions = table.get_mut(id).unwrap();
169 let target = transitions.get_mut(key).unwrap();
170 target.0 = Some(next_id);
171
172 next_id
173 };
174
175 id = next_id as usize;
176 }
177 }
178
179 let key = keys.last().unwrap(); // handle the last key of all path variants
180 let transitions = table.get_mut(id).unwrap();
181 let target = transitions.get_mut(*key).unwrap();
182 target.1 = Some(ascii as u16);
183 target.2 = leftover as u8;
184 }
185 }
186
187 table
188}
189
190/// Generates a black transition object based on the provided speed attribute.
191fn generate_blank_transition(speed: usize) -> Vec<(Option<u8>, Option<u16>, u8)> {
192 let mut transition = Vec::new();
193
194 for _ in 0..2u32.pow(speed as u32) {
195 transition.push((None, None, 0));
196 }
197
198 transition
199}
200
201/// Generates all key paths for the particular coding. If the key has leftovers
202/// the function will fill that gap with all possible variants.
203///
204/// ```tst
205/// Code: [1100, 0000, 1110, 011X]
206/// Variants: [1100, 0000, 1110, 0110]
207/// [1100, 0000, 1110, 0111]
208/// ```
209fn generate_coding_paths(coding: &(u8, u32), speed: usize) -> Vec<Vec<usize>> {
210 let mut bits: u32 = 0; // HPACK value can be up to 32 bits
211 let chunks_len = (coding.0 as f32 / speed as f32).ceil() as usize;
212 let chunk_max = 2u32.pow(speed as u32) as usize - 1;
213 let leftover = chunks_len * speed - coding.0 as usize;
214 bits |= coding.1;
215 bits <<= leftover;
216
217 let mut chunks = vec![];
218 for i in 0..chunks_len {
219 let mut chunk = (bits >> (chunks_len - i - 1) * speed) as usize;
220 chunk &= chunk_max;
221 chunks.push(chunk as usize);
222 }
223
224 let mut variants: Vec<Vec<usize>> = vec![];
225 let chunk_last = chunks.pop().unwrap();
226 let variants_len = 2u32.pow(leftover as u32) as usize;
227 for i in 0..variants_len {
228 let mut chunks = chunks.clone();
229 chunks.push(chunk_last + i);
230 variants.push(chunks);
231 }
232
233 variants
234}
235
236#[cfg(test)]
237mod test {
238 use super::*;
239
240 fn sample_encoding_table() -> &'static [(u8, u32)] {
241 &[
242 (13, 0x1ff8),
243 (23, 0x7fffd8),
244 (28, 0xfffffe2),
245 (28, 0xfffffe3),
246 (28, 0xfffffe4),
247 (28, 0xfffffe5),
248 (28, 0xfffffe6),
249 (28, 0xfffffe7),
250 (28, 0xfffffe8),
251 ]
252 }
253
254 /// Should generate a translation matrix that allows for decoding Huffman
255 /// sequence by reading 1 bit at a time.
256 #[test]
257 fn flattens_1bits() {
258 let table = flatten(&sample_encoding_table(), DecoderSpeed::OneBit);
259 assert_eq!(table.len(), 41);
260
261 let target = &table[2][1];
262 assert_eq!(target.0, Some(3));
263 assert_eq!(target.1, None);
264 assert_eq!(target.2, 0);
265 }
266
267 /// Should generate a translation matrix that allows for decoding Huffman
268 /// sequence by reading 2 bits at a time.
269 #[test]
270 fn flattens_2bits() {
271 let table = flatten(&sample_encoding_table(), DecoderSpeed::TwoBits);
272 assert_eq!(table.len(), 20);
273
274 let target = &table[1][3];
275 assert_eq!(target.0, Some(2));
276 assert_eq!(target.1, None);
277 assert_eq!(target.2, 0);
278 }
279
280 /// Should generate a translation matrix that allows for decoding Huffman
281 /// sequence by reading 3 bits at a time.
282 #[test]
283 fn flattens_3bits() {
284 let table = flatten(&sample_encoding_table(), DecoderSpeed::ThreeBits);
285 assert_eq!(table.len(), 16);
286
287 let target = &table[1][7];
288 assert_eq!(target.0, Some(2));
289 assert_eq!(target.1, None);
290 assert_eq!(target.2, 0);
291 }
292
293 /// Should generate a translation matrix that allows for decoding Huffman
294 /// sequence by reading 4 bits at a time.
295 #[test]
296 fn flattens_4bits() {
297 let table = flatten(&sample_encoding_table(), DecoderSpeed::FourBits);
298 assert_eq!(table.len(), 9);
299
300 let target = &table[1][15];
301 assert_eq!(target.0, Some(2));
302 assert_eq!(target.1, None);
303 assert_eq!(target.2, 0);
304 }
305
306 /// Should generate a translation matrix that allows for decoding Huffman
307 /// sequence by reading 5 bits at a time.
308 #[test]
309 fn flattens_5bits() {
310 let table = flatten(&sample_encoding_table(), DecoderSpeed::FiveBits);
311 assert_eq!(table.len(), 8);
312
313 let target = &table[1][31];
314 assert_eq!(target.0, Some(2));
315 assert_eq!(target.1, None);
316 assert_eq!(target.2, 0);
317 }
318
319 /// Should generate all key paths variants for the codings with leftover.
320 #[test]
321 fn generates_coding_paths() {
322 assert_eq!(generate_coding_paths(&(14, 12345), 4), vec![ // code=11000000|111001XX, len=14
323 vec![12, 0, 14, 4], // [1100, 0000, 1110, 0100]
324 vec![12, 0, 14, 5], // [1100, 0000, 1110, 0101]
325 vec![12, 0, 14, 6], // [1100, 0000, 1110, 0110]
326 vec![12, 0, 14, 7], // [1100, 0000, 1110, 0111]
327 ]);
328 assert_eq!(generate_coding_paths(&(13, 2616), 4), vec![ // code=01010001|11000XXX, let=13
329 vec![5, 1, 12, 0], // [0101, 0000, 1110, 0000]
330 vec![5, 1, 12, 1], // [0101, 0000, 1110, 0001]
331 vec![5, 1, 12, 2], // [0101, 0000, 1110, 0010]
332 vec![5, 1, 12, 3], // [0101, 0000, 1110, 0011]
333 vec![5, 1, 12, 4], // [0101, 0000, 1110, 0100]
334 vec![5, 1, 12, 5], // [0101, 0000, 1110, 0101]
335 vec![5, 1, 12, 6], // [0101, 0000, 1110, 0110]
336 vec![5, 1, 12, 7], // [0101, 0000, 1110, 0111]
337 ]);
338 }
339}