Skip to main content

Module arc_squeeze

Module arc_squeeze 

Source
Available on crate feature arc_squeeze only.
Expand description

ARC “Squeeze” — the CP/M SQ/USQ codec, ARC method 4.

Squeeze is a two-stage codec from circa 1981–1985:

  1. A run-length pre-pass that collapses runs of a repeated byte. The flag byte is 0x90 (“DLE”): the sequence b 0x90 n means “byte b repeated n times” (the first b is emitted literally before the flag, so the run length on the wire is n and n total copies of b appear in the output, with the leading literal counting as one). A literal 0x90 is encoded as 0x90 0x00.
  2. Static Huffman coding of the RLE output. The Huffman tree is built over the byte frequencies plus a distinguished end-of-stream symbol (SPEOF = 256) and is serialised into the stream header as a node table.

§Wire format (raw method payload)

This module implements the codec payload only — no ARC archive header, no filename, no checksum (those live in the ARC container, exactly like the zip-method codecs in this crate). The payload is:

+-----------+========================+======================+
| numnodes  | node table (4·N bytes) | Huffman bitstream     |
| (u16 LE)  | 2× i16 LE per node     | (LSB-first)           |
+-----------+========================+======================+

Each node is a pair of i16 LE children (left, right). A non-negative child is the index of another node; a negative child c is a leaf for the value -(c) - 1 (values 0..=255 are bytes, value 256 is SPEOF, end-of-stream). Decoding walks from node 0: a 0 bit takes the left child, a 1 bit the right. numnodes == 0 denotes the empty stream.

§Scope

Both directions are implemented and validated by round-trip. The decoder reverses the Huffman stage then the RLE stage; the encoder runs RLE then builds and serialises a canonical Huffman tree.

§DoS hygiene

Crafted streams never panic. A malformed node table (out-of-range child index, a cycle, or a leaf value > 256) returns Error::InvalidHuffmanTree; a truncated bitstream that cannot reach a leaf returns Error::UnexpectedEnd; the node count is bounded (<= MAX_NODES); RLE output growth uses checked arithmetic and the run expansion is bounded.

§References

  • Richard Greenlaw’s SQ/USQ (1981) and the ARC method-4 description, widely archived alongside nomarch and the CP/M utilities. Used as a format reference only — no GPL/BSD source copied.

Structs§

ArcSqueeze
Zero-sized marker type implementing Algorithm for ARC Squeeze.
Decoder
Streaming ARC Squeeze decoder.
Encoder
Streaming ARC Squeeze encoder.