Module refpack::data::compression

source ·
Expand description

Compression scheme is heavily based on lz77. Exact compression algorithm may be subject to change.

Basic concept is to track literal bytes as you encounter them, and have some way of identifying when current bytes match previously encountered sequences.

Current tracked literal bytes must be written before a back-reference copy command is written

Literal blocks have a max length of 112, and if this limit is reached the literal sequence must be split into two (or more) blocks to properly encode the literals

Due to the limited precision of literal blocks, special handling is required for writing literal blocks before copy or stop controls. The literal block needs to be “split” to make the literal take an even multiple of 4 bytes.

This is done by getting the modulus of the number of bytes modulo 4 and then subtracting this remainder from the total length.

Simple pseudo-rust:

let tracked_bytes_length = 117;
let num_bytes_in_copy = tracked_bytes_length % 4; // 1
let num_bytes_in_literal = 117 - num_bytes_in_copy; // 116; factors by 4

See Command for a specification of control codes

Functions

  • Compress a data stream from a Reader to refpack format into a Writer.
  • Wrapped compress function with a bit easier and cleaner of an API. Takes a &[u8] slice of uncompressed bytes and returns a Vec<u8> of compressed bytes