1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
//! LZNT1 — NTFS native file compression.
//!
//! Block-structured LZ77 with no entropy coding. Documented in Microsoft
//! [MS-XCA] section 2.5. The stream is a sequence of independent 4 KiB
//! chunks; each chunk carries a 2-byte little-endian header followed by
//! either the chunk's raw bytes (uncompressed chunks) or a sequence of
//! "flag groups" of literals and back-references (compressed chunks).
//!
//! ## Chunk header
//!
//! ```text
//! 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
//! +--+-----------+--------------------------------+
//! |C | sig=011 | chunk_size - 1 |
//! +--+-----------+--------------------------------+
//! ```
//!
//! - Bit 15 = compressed flag (`C`).
//! - Bits 14..=12 = block signature, fixed to `0b011 = 3`.
//! - Bits 11..=0 = `chunk_size - 1`, where `chunk_size` counts only the
//! body bytes that follow the header (uncompressed chunks always carry
//! exactly 4096 body bytes except for the final tail chunk).
//!
//! An all-zero 2-byte word (or end-of-input) terminates the stream.
//!
//! ## Compressed chunk body
//!
//! A compressed chunk body is a sequence of "flag groups". Each group is
//! a single flag byte followed by up to 8 tokens. Bit `i` of the flag byte
//! selects token type: `0` = 1-byte literal, `1` = 2-byte little-endian
//! match. Token order is LSB-first (bit 0 = first token).
//!
//! ## Match encoding
//!
//! Each match is 16 bits little-endian. The split between offset and
//! length bits varies with the number of bytes emitted so far in the
//! current chunk, growing the offset field as more history is available:
//!
//! | bytes emitted | offset bits | length bits | offset range | length range |
//! |---------------|-------------|-------------|--------------|--------------|
//! | 1..=16 | 12 | 4 | 1..=4096 | 3..=18 |
//! | 17..=32 | 11 | 5 | 1..=2048 | 3..=34 |
//! | 33..=64 | 10 | 6 | 1..=1024 | 3..=66 |
//! | 65..=128 | 9 | 7 | 1..=512 | 3..=130 |
//! | 129..=256 | 8 | 8 | 1..=256 | 3..=258 |
//! | 257..=512 | 7 | 9 | 1..=128 | 3..=514 |
//! | 513..=1024 | 6 | 10 | 1..=64 | 3..=1026 |
//! | 1025..=2048 | 5 | 11 | 1..=32 | 3..=2050 |
//! | 2049..=4096 | 4 | 12 | 1..=16 | 3..=4098 |
//!
//! The encoded value is `((offset - 1) << length_bits) | (length - 3)`.
//! Decoding inverts: `length = (token & length_mask) + 3`,
//! `offset = (token >> length_bits) + 1`.
//!
//! ## Sliding window
//!
//! Per-chunk: each chunk is encoded and decoded independently with a
//! fresh history. Back-references cannot cross chunk boundaries.
use crateAlgorithm;
pub use Decoder;
pub use ;
/// Maximum uncompressed bytes per chunk (4096 = 2^12).
pub const CHUNK_SIZE: usize = 4096;
/// Pick the (offset_bits, length_bits) split for a position. `pos` is
/// the number of bytes already emitted into the current chunk before
/// the match. Implements the MS-XCA 2.5 table:
///
/// | pos | offset_bits | length_bits |
/// |-----------|-------------|-------------|
/// | 1..=15 | 12 | 4 |
/// | 16..=31 | 11 | 5 |
/// | 32..=63 | 10 | 6 |
/// | 64..=127 | 9 | 7 |
/// | 128..=255 | 8 | 8 |
/// | 256..=511 | 7 | 9 |
/// | 512..=1023| 6 | 10 |
/// | 1024..=2047| 5 | 11 |
/// | 2048.. | 4 | 12 |
pub
/// Zero-sized marker type implementing [`Algorithm`] for LZNT1.
;