lznt1 only.Expand description
LZNT1 — NTFS native file compression.
Block-structured LZ77 with no entropy coding. Documented in Microsoft [MS-XCA] section 2.5. The stream is a sequence of independent 4 KiB chunks; each chunk carries a 2-byte little-endian header followed by either the chunk’s raw bytes (uncompressed chunks) or a sequence of “flag groups” of literals and back-references (compressed chunks).
§Chunk header
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+--+-----------+--------------------------------+
|C | sig=011 | chunk_size - 1 |
+--+-----------+--------------------------------+- Bit 15 = compressed flag (
C). - Bits 14..=12 = block signature, fixed to
0b011 = 3. - Bits 11..=0 =
chunk_size - 1, wherechunk_sizecounts only the body bytes that follow the header (uncompressed chunks always carry exactly 4096 body bytes except for the final tail chunk).
An all-zero 2-byte word (or end-of-input) terminates the stream.
§Compressed chunk body
A compressed chunk body is a sequence of “flag groups”. Each group is
a single flag byte followed by up to 8 tokens. Bit i of the flag byte
selects token type: 0 = 1-byte literal, 1 = 2-byte little-endian
match. Token order is LSB-first (bit 0 = first token).
§Match encoding
Each match is 16 bits little-endian. The split between offset and length bits varies with the number of bytes emitted so far in the current chunk, growing the offset field as more history is available:
| bytes emitted | offset bits | length bits | offset range | length range |
|---|---|---|---|---|
| 1..=16 | 12 | 4 | 1..=4096 | 3..=18 |
| 17..=32 | 11 | 5 | 1..=2048 | 3..=34 |
| 33..=64 | 10 | 6 | 1..=1024 | 3..=66 |
| 65..=128 | 9 | 7 | 1..=512 | 3..=130 |
| 129..=256 | 8 | 8 | 1..=256 | 3..=258 |
| 257..=512 | 7 | 9 | 1..=128 | 3..=514 |
| 513..=1024 | 6 | 10 | 1..=64 | 3..=1026 |
| 1025..=2048 | 5 | 11 | 1..=32 | 3..=2050 |
| 2049..=4096 | 4 | 12 | 1..=16 | 3..=4098 |
The encoded value is ((offset - 1) << length_bits) | (length - 3).
Decoding inverts: length = (token & length_mask) + 3,
offset = (token >> length_bits) + 1.
§Sliding window
Per-chunk: each chunk is encoded and decoded independently with a fresh history. Back-references cannot cross chunk boundaries.
Structs§
- Decoder
- Encoder
- Encoder
Config - Per-encoder configuration. LZNT1 has no compression level knob in the
MS-XCA format; this is a unit type today and exists so the public
Encodersignature can grow knobs (e.g. a “fast vs. best” match strategy) without a breaking change. - Lznt1
- Zero-sized marker type implementing
Algorithmfor LZNT1.