Skip to main content

Module lznt1

Module lznt1 

Source
Available on crate feature lznt1 only.
Expand description

LZNT1 — NTFS native file compression.

Block-structured LZ77 with no entropy coding. Documented in Microsoft [MS-XCA] section 2.5. The stream is a sequence of independent 4 KiB chunks; each chunk carries a 2-byte little-endian header followed by either the chunk’s raw bytes (uncompressed chunks) or a sequence of “flag groups” of literals and back-references (compressed chunks).

§Chunk header

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
+--+-----------+--------------------------------+
|C |  sig=011  |       chunk_size - 1            |
+--+-----------+--------------------------------+
  • Bit 15 = compressed flag (C).
  • Bits 14..=12 = block signature, fixed to 0b011 = 3.
  • Bits 11..=0 = chunk_size - 1, where chunk_size counts only the body bytes that follow the header (uncompressed chunks always carry exactly 4096 body bytes except for the final tail chunk).

An all-zero 2-byte word (or end-of-input) terminates the stream.

§Compressed chunk body

A compressed chunk body is a sequence of “flag groups”. Each group is a single flag byte followed by up to 8 tokens. Bit i of the flag byte selects token type: 0 = 1-byte literal, 1 = 2-byte little-endian match. Token order is LSB-first (bit 0 = first token).

§Match encoding

Each match is 16 bits little-endian. The split between offset and length bits varies with the number of bytes emitted so far in the current chunk, growing the offset field as more history is available:

bytes emittedoffset bitslength bitsoffset rangelength range
1..=161241..=40963..=18
17..=321151..=20483..=34
33..=641061..=10243..=66
65..=128971..=5123..=130
129..=256881..=2563..=258
257..=512791..=1283..=514
513..=10246101..=643..=1026
1025..=20485111..=323..=2050
2049..=40964121..=163..=4098

The encoded value is ((offset - 1) << length_bits) | (length - 3). Decoding inverts: length = (token & length_mask) + 3, offset = (token >> length_bits) + 1.

§Sliding window

Per-chunk: each chunk is encoded and decoded independently with a fresh history. Back-references cannot cross chunk boundaries.

Structs§

Decoder
Encoder
EncoderConfig
Per-encoder configuration. LZNT1 has no compression level knob in the MS-XCA format; this is a unit type today and exists so the public Encoder signature can grow knobs (e.g. a “fast vs. best” match strategy) without a breaking change.
Lznt1
Zero-sized marker type implementing Algorithm for LZNT1.