oxideav_webp/vp8l_encode.rs
1//! VP8L (WebP-Lossless) §3.8 / §3.7 *encoder*.
2//!
3//! This is the writer counterpart of the round-99..111 decoder stack. The
4//! decoder ([`crate::vp8l_transform::decode_lossless`]) walks a VP8L chunk
5//! payload — §3.4 image-header, §3.8.2 transform list, §3.8.3 image data
6//! (color-cache-info, meta-prefix, prefix-codes, LZ77-coded image) — and
7//! produces ARGB pixels. This module produces a VP8L chunk payload from
8//! ARGB pixels, taking the simplest end-to-end path the spec admits:
9//!
10//! * **§3.8.2 optional subtract-green transform** — as of round 120 the
11//! encoder evaluates both the no-transform and subtract-green paths and
12//! emits whichever is smaller. The subtract-green transform (`%b1 %b10`
13//! in the §3.8.2 grammar; transform type 2 per §3.5 Table 1) carries
14//! no body bits and subtracts the green channel from red and blue
15//! before the entropy stage, lowering per-pixel red/blue entropy on
16//! natural images (the spec's §3.5.3 motivation: "this transform is
17//! redundant, as it can be modeled using the color transform, but since
18//! there is no additional data here, the subtract green transform can
19//! be coded using fewer bits"). The other three transforms (predictor
20//! / color / color-indexing) get their own forward passes in later
21//! rounds.
22//! * **§5.2.1 / §5.2.3 color cache** — as of round 121 the encoder
23//! evaluates a color cache alongside the no-cache path and emits
24//! whichever is smaller. As of round 148 the chooser sweeps every
25//! §5.2.3 `cache_code_bits ∈ [1..11]` per the spec's allowed range
26//! (2..=2048-entry caches) and picks the smallest stream, rather
27//! than the round-121 fixed 256-entry choice. When the cache is
28//! enabled, the §3.8.3 `color-cache-info` field becomes
29//! `%b1 code_bits` (1-bit flag + 4-bit `code_bits`), the GREEN
30//! alphabet grows to `256 + 24 + (1 << code_bits)` symbols, and
31//! each repeat of a previously-inserted ARGB literal is emitted as
32//! a §5.2.3 color-cache code `256 + 24 + index` instead of four
33//! separate ARGB-channel literals.
34//! Cache state is maintained per §5.2.3: every emitted pixel — literal
35//! *and* every pixel covered by a §5.2.2 backward-reference copy — is
36//! re-inserted at its hashed slot
37//! (`(0x1e35a7bd * argb) >> (32 - code_bits)`). The chooser cross-
38//! products with subtract-green so the encoder picks the best of
39//! `(no-tx | subtract-green) × (no-cache | cache)`; on uncorrelated /
40//! non-repeating content the no-cache no-tx path wins and is kept.
41//! * **Single §3.7.2.2 meta-prefix code** — `meta-prefix` is `%b0`, so one
42//! [`crate::meta_prefix::PrefixCodeGroup`] of five prefix codes applies
43//! to the whole image.
44//! * **Literal-only §3.8.3 image data** — every pixel is a §3.7.3 ARGB
45//! literal (green via prefix code #1, red/blue/alpha via #2/#3/#4). No
46//! LZ77 backward references are emitted by [`encode_argb_literals`], so
47//! the distance prefix code (#5) is the single-symbol-0 form the §3.7.2.1.1
48//! note sanctions ("empty prefix codes can be coded as those containing a
49//! single symbol 0").
50//!
51//! The result, wrapped by [`encode_webp_lossless`] in the §2.4 RIFF/WEBP
52//! framing (via [`crate::build`]), decodes back to the exact input pixels
53//! through [`crate::decode_webp`] — a pixel-exact round trip.
54//!
55//! ## §3.7.2 prefix-code construction
56//!
57//! For each of the five symbol alphabets the encoder:
58//!
59//! 1. counts symbol frequencies over the data it will emit;
60//! 2. builds a length-limited (≤ [`MAX_CODE_LENGTH`]) canonical
61//! Huffman code-length assignment from those frequencies
62//! ([`build_code_lengths`]);
63//! 3. writes the code lengths to the stream with the §3.7.2.1.2 *normal
64//! code length code* (or the trivial single-symbol form), then writes
65//! each symbol with the canonical code derived from the lengths.
66//!
67//! The canonical code assignment ([`canonical_codes`]) is the identical
68//! `(length, value)`-ordered rule the decoder's
69//! [`crate::vp8l_prefix::PrefixCode`] reads, so a code emitted here
70//! decodes there bit-for-bit.
71//!
72//! ## §5.2.2 LZ77 backward-reference matching
73//!
74//! As of round 119, [`encode_argb_literals`] runs an optional §5.2.2
75//! backward-reference pass before emitting the image data. A hash-chain
76//! matcher ([`Lz77Matcher`]) finds repeated pixel runs; each run of
77//! `length >= MIN_MATCH` pixels at scan-line distance `D` is emitted as a
78//! §5.2.2 *length + distance code* pair instead of `length` separate ARGB
79//! literals, compressing repetitive images. The match's length is encoded
80//! via the GREEN alphabet's length-prefix symbols (`256 + prefix_code`).
81//!
82//! As of round 130 the encoder picks the **smaller** of two distance-code
83//! forms per backward reference:
84//!
85//! 1. The *scan-line* encoding `distance_code = D + NUM_DISTANCE_MAP_CODES`
86//! (always valid, was the round-119 default).
87//! 2. Any §5.2.2 *distance map* code `c ∈ 1..=120` whose
88//! `(xi, yi) = DISTANCE_MAP[c-1]` satisfies `max(xi + yi*W, 1) == D` for
89//! the image width `W`. These small codes feed the §5.2.2 distance
90//! prefix code through low-prefix slots (codes `1..=4` use 0 extra bits,
91//! code `5` uses 1 extra bit) instead of the high-prefix slots that
92//! `D + 120` for typical row distances would fall into.
93//!
94//! The reconstruction in
95//! [`crate::vp8l_decode::distance_code_to_pixel_distance`] is identical for
96//! both forms (`xi + yi*W` clamped to 1), so round-trips remain bit-exact.
97//! Photo-like content with vertical correlation (every scan-line referring
98//! to the row above) sees a dramatic improvement: a row-distance match on
99//! a 256-wide image goes from prefix 16 (8-ish bits Huffman + 7 extra) to
100//! prefix 0 (1–4 bits Huffman + 0 extra), shrinking the per-match cost by
101//! ~10 bits. The width-aware helper is
102//! [`pixel_distance_to_distance_code`]; the round-119 scan-line-only
103//! form is still used as the chooser's fallback whenever no distance-map
104//! code matches.
105//!
106//! The inverse of the §5.2.2 prefix-value transform ([`value_to_prefix`])
107//! splits a length/distance into its prefix code and extra bits, the exact
108//! counterpart of the decoder's [`crate::vp8l_decode::read_lz77_value`].
109//!
110//! The literal-only path is still available via [`encode_argb_literals_only`]
111//! (used by the size-reduction comparison test); the default
112//! [`encode_argb_literals`] entry point chooses the LZ77 path.
113//!
114//! As of round 163 the matcher applies **four-position lazy matching
115//! with a diminishing-returns guard**: after finding a match
116//! `(L_a, _)` at `pos`, the encoder also probes `pos + 1`, `pos + 2`,
117//! and `pos + 3` (the round-158 depth-3 contract), and then — only
118//! when the running best across those four positions is still shorter
119//! than [`DEPTH4_GUARD_THRESHOLD`] — also probes `pos + 4`. Whichever
120//! of the candidate start positions yields the strictly longest match
121//! wins; the pixels skipped to reach the chosen start are emitted as
122//! literals. The depth-4 guard captures the empirical observation
123//! that once the depth-3 best already covers a length-`THRESHOLD` run,
124//! a fourth-order swap is almost never able to amortise the four
125//! literals it would cost — the depth-4 probe is gated to avoid
126//! spending hash-chain inserts and a `find` call when its expected
127//! marginal payoff is small. This still recovers fourth-order traps
128//! where the leading match at `pos..=pos + 3` is short. The decoder
129//! output is bit-identical for any input — only the token *partition*
130//! shifts (by up to four pixels) — so round-trips remain bit-exact
131//! under any input. See [`tokenize_lz77_inner`] for the shared
132//! `lazy_depth: u32`-toggled implementation (`0` strict-greedy r155
133//! baseline, `1` r156 depth-1, `2` r157 depth-2, `3` r158 depth-3,
134//! `4` r163 guarded depth-4, now the production default).
135//!
136//! ## §4.1 spatial-predictor forward transform
137//!
138//! The encoder also evaluates the §4.1 predictor transform path: the
139//! image is divided into `(1 << DEFAULT_PREDICTOR_SIZE_BITS)`-pixel
140//! square blocks; each block picks the prediction mode `0..=13` that
141//! minimises a residual-magnitude proxy (sum of per-channel
142//! `|residual|` folded onto `[-128, 127]`) over the block's pixels.
143//! As of round 159, the chooser also threads an
144//! **entropy-image-aware tie-break** through the per-block walk:
145//! when multiple modes tie on residual cost, the chooser prefers
146//! the mode chosen by the *previous neighbour* block (left-of in
147//! the current row, or top-of for the left-column blocks). The
148//! predictor sub-image is written as a §7.2 `entropy-coded-image`,
149//! so adjacent blocks carrying the same mode value reduce that
150//! sub-image's symbol entropy and the bytes the writer emits for
151//! it; this matches RFC 9649 §3.5's "transform data can be decided
152//! based on entropy minimization" note. The residuals themselves
153//! are unchanged on tie-equal swaps (the cost was already minimal),
154//! so decoded pixels stay bit-identical. As of round 160 the
155//! chooser also evaluates a **slack-cost variant** of the
156//! tie-break — see [`pick_block_mode_with_hint_slack`] — that
157//! accepts the preferred neighbour mode at a small additive
158//! `slack` budget above the otherwise-best cost, trading a small
159//! residual increase for a strict drop in the sub-image's symbol
160//! entropy. The slack variant is one of four predictor candidates
161//! the production chooser builds per `size_bits` (slack ∈
162//! `{0, block_pixels, 2·block_pixels, 4·block_pixels}`), and the
163//! byte-shortest stream wins — so the slack candidates can only
164//! add options to the chooser's selection set and never regress.
165//! The sub-resolution predictor image is written as a §7.2
166//! `predictor-image = 3BIT entropy-coded-image` and the per-pixel
167//! residuals are then handed to the standard
168//! `spatially-coded-image` writer. As of round 155 the chooser
169//! sweeps two `size_bits` values for the §4.1 predictor: the
170//! default 16×16-pixel blocks (per-region predictor-mode
171//! granularity, good for images whose best-mode varies spatially)
172//! and a maximal single-block transform whose `size_bits` is large
173//! enough that the entire image collapses to one mode (`1 << size`
174//! ≥ max(width, height), so the sub-image is at most 1×1 — the
175//! cheapest possible §4.1 header). Each predictor `size_bits`
176//! candidate uses the round-148 cache-bits sweep (§5.2.3
177//! `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline)
178//! and is cross-compared against the no-tx / subtract-green
179//! candidates; the smallest stream wins. On smooth gradients with
180//! strong spatial correlation, the predictor path's per-pixel
181//! residual entropy is much lower than the raw pixels' entropy,
182//! more than paying for the predictor-image overhead.
183//!
184//! ## §3.5.2 / §4.2 color-transform forward pass
185//!
186//! As of round 147 the encoder also evaluates the §3.5.2 / §4.2
187//! color transform: the image is divided into
188//! `(1 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS)`-pixel square blocks; each
189//! block picks a `(green_to_red, green_to_blue, red_to_blue)` triple
190//! that minimises a residual-magnitude proxy on the red and blue
191//! channels (the green channel is untouched per §3.5.2). The
192//! per-axis sweep is exact because the cost decomposes additively
193//! across channels: `red_residual` depends only on `green_to_red`,
194//! `blue_residual` depends additively on `(green_to_blue,
195//! red_to_blue)`, so the three axes can be optimised independently
196//! over a small candidate grid (see [`CTE_AXIS_CANDIDATES`]). The
197//! sub-resolution color image is written as a §7.2
198//! `color-image = 3BIT entropy-coded-image` (re-using
199//! `write_entropy_coded_image_literals`) and the per-pixel residuals
200//! are then handed to the standard `spatially-coded-image` writer.
201//! Each color-transform `size_bits` candidate uses the round-148
202//! cache-bits sweep (§5.2.3 `cache_code_bits ∈ [1..11]` plus the
203//! disabled-cache baseline) and is cross-compared against the no-tx,
204//! subtract-green, and §4.1 predictor candidates; the smallest stream
205//! wins. On natural images with red/green and blue/green correlation,
206//! the color-transform path concentrates the red/blue residuals near
207//! zero, shrinking the per-channel Huffman codes and further reducing
208//! the chosen stream's size on top of the §4.1 predictor pass.
209//!
210//! ## §4.4 color-indexing transform encoder
211//!
212//! As of round 150 the encoder also evaluates the §4.4 color-indexing
213//! transform: an O(N) palette probe walks `pixels` and bails out
214//! early at >256 unique ARGB values; below that threshold a sorted
215//! palette is built (sorted ARGB-numerically so the §4.4
216//! subtraction-coded color-table deltas concentrate near zero), each
217//! pixel is replaced by its palette index, and indices are bundled
218//! into one byte per the §4.4 table (`width_bits = 3 / 2 / 1 / 0`
219//! for palettes of 1..=2 / 3..=4 / 5..=16 / 17..=256 entries —
220//! packing 8 / 4 / 2 / 1 indices into each green byte respectively).
221//! The bundled image is then handed to the standard
222//! `spatially-coded-image` writer at the subsampled `packed_width =
223//! DIV_ROUND_UP(width, 1 << width_bits)`. The color-indexing
224//! candidate uses the round-148 cache-bits sweep (§5.2.3
225//! `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline) and
226//! is cross-compared against every other candidate; the smallest
227//! stream wins. On palette-ish content (icons, line art, screen
228//! captures) the index-bundling drops the entropy stage's symbol
229//! count by 2..8×, more than paying for the small subtraction-coded
230//! palette-write overhead.
231//!
232//! ## What this module does NOT do
233//!
234//! * No multi-meta-prefix (§6.2.2 entropy image). All candidates use
235//! a single prefix-code group for the entire image.
236//! * No `oxideav-core` runtime dependency — this module compiles under
237//! `--no-default-features`.
238
239use crate::build::{self, ImageKind};
240
241/// The largest code length a VP8L canonical prefix code may use (§3.7.2.1.2
242/// stores literal code lengths in `[0..15]`). Mirrors
243/// [`crate::vp8l_prefix::MAX_CODE_LENGTH`].
244pub const MAX_CODE_LENGTH: usize = 15;
245
246/// §3.7.2.1.2 `kCodeLengthCodes`: the 19-symbol code-length-code alphabet.
247pub const NUM_CODE_LENGTH_CODES: usize = 19;
248
249/// §3.7.2.1.2 `kCodeLengthCodeOrder`: the order the (up to 19)
250/// code-length-code lengths are transmitted in. Identical to the decoder's
251/// [`crate::vp8l_prefix::CODE_LENGTH_CODE_ORDER`].
252pub const CODE_LENGTH_CODE_ORDER: [usize; NUM_CODE_LENGTH_CODES] = [
253 17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
254];
255
256/// Errors raised while encoding a VP8L image.
257#[derive(Debug, Clone, PartialEq, Eq)]
258pub enum EncodeError {
259 /// The caller passed an empty pixel buffer, or one whose length does
260 /// not match `width * height * 4`.
261 PixelBufferMismatch {
262 /// Bytes the caller supplied.
263 got: usize,
264 /// Bytes expected (`width * height * 4`).
265 expected: usize,
266 },
267 /// `width` or `height` was zero, or exceeded the §3.4 14-bit field
268 /// maximum of 16384.
269 InvalidDimensions {
270 /// The offending width.
271 width: u32,
272 /// The offending height.
273 height: u32,
274 },
275 /// The RIFF/WEBP framing builder rejected the assembled payload.
276 Build(build::BuildError),
277}
278
279impl From<build::BuildError> for EncodeError {
280 fn from(e: build::BuildError) -> Self {
281 Self::Build(e)
282 }
283}
284
285impl core::fmt::Display for EncodeError {
286 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
287 match self {
288 Self::PixelBufferMismatch { got, expected } => write!(
289 f,
290 "VP8L encode: pixel buffer is {got} bytes, expected {expected} (width*height*4)"
291 ),
292 Self::InvalidDimensions { width, height } => write!(
293 f,
294 "VP8L encode: invalid dimensions {width}x{height} (must be 1..=16384)"
295 ),
296 Self::Build(e) => write!(f, "VP8L encode: RIFF/WEBP framing: {e}"),
297 }
298 }
299}
300
301impl std::error::Error for EncodeError {}
302
303/// §3.4 14-bit `width - 1` / `height - 1` field maximum (1-based 16384).
304const MAX_DIMENSION: u32 = 1 << 14;
305
306/// Least-significant-bit-first bit writer over a growing byte buffer.
307///
308/// The exact inverse of [`crate::vp8l_stream::BitReader`]: bits are packed
309/// LSB-first within each byte and bytes accumulate in stream order. A
310/// multi-bit write lays the value's bit 0 down first, so a subsequent
311/// `read_bits(n)` returns it unchanged.
312#[derive(Debug, Default, Clone)]
313pub struct BitWriter {
314 bytes: Vec<u8>,
315 bit_pos: usize,
316}
317
318impl BitWriter {
319 /// Create an empty bit writer positioned at bit 0.
320 pub fn new() -> Self {
321 Self::default()
322 }
323
324 /// The number of bits written so far.
325 pub fn bit_position(&self) -> usize {
326 self.bit_pos
327 }
328
329 /// Write the low `n` bits of `value` (0 ≤ `n` ≤ 32) LSB-first.
330 ///
331 /// Writing 0 bits is a no-op (mirrors the reader's `read_bits(0)`).
332 pub fn write_bits(&mut self, value: u32, n: usize) {
333 debug_assert!(n <= 32, "write_bits supports up to 32 bits");
334 let mut value = value;
335 for _ in 0..n {
336 let byte_idx = self.bit_pos >> 3;
337 if byte_idx >= self.bytes.len() {
338 self.bytes.push(0);
339 }
340 let bit = (value & 1) as u8;
341 self.bytes[byte_idx] |= bit << (self.bit_pos & 7);
342 self.bit_pos += 1;
343 value >>= 1;
344 }
345 }
346
347 /// Write a single bit.
348 pub fn write_bit(&mut self, bit: bool) {
349 self.write_bits(bit as u32, 1);
350 }
351
352 /// Consume the writer and return the packed bytes (the final partial
353 /// byte is zero-padded in its high bits).
354 pub fn into_bytes(self) -> Vec<u8> {
355 self.bytes
356 }
357}
358
359/// Build a length-limited (≤ [`MAX_CODE_LENGTH`]) canonical Huffman
360/// code-length assignment for an alphabet of `freqs.len()` symbols.
361///
362/// Returns a `Vec<u8>` of code lengths, one per symbol (0 = symbol unused).
363/// The construction guarantees the §3.7.2 completeness invariant the
364/// decoder enforces — the Kraft sum of `2^-len` over used symbols equals
365/// exactly one — for every input with at least two used symbols, and it
366/// produces the §3.7.2.1.2 single-leaf form (one symbol at length 1) for an
367/// input with exactly one used symbol.
368///
369/// The algorithm is a textbook Huffman build, followed by a
370/// length-limiting pass that caps any over-long code at
371/// [`MAX_CODE_LENGTH`] while re-balancing so the Kraft sum stays at
372/// exactly one. For the small alphabets and pixel counts this encoder
373/// targets, the cap is rarely hit; the pass is correctness insurance,
374/// not an optimization.
375///
376/// The merge loop exploits the classic two-queue property instead of a
377/// heap: with the leaves sorted ascending by `(frequency, symbol)` once
378/// up front, every internal node is created with a frequency no smaller
379/// than any previously created one, so a plain FIFO of internal nodes
380/// stays sorted by `(frequency, creation order)` for free. Each merge
381/// step then takes the two smallest nodes by comparing the two queue
382/// fronts in O(1) — preferring the leaf on a frequency tie, because the
383/// tie-break order ranks every leaf (ascending symbol) before every
384/// internal node (creation order). This reproduces, merge for merge, the
385/// exact `(freq, order)` pop sequence the previous min-heap build used,
386/// so the emitted length tables are bit-identical; only the cost drops
387/// (O(n log n) sort + O(n) merge, versus 3(n-1) heap operations of
388/// O(log n) swaps each).
389pub fn build_code_lengths(freqs: &[u32]) -> Vec<u8> {
390 let n = freqs.len();
391 let mut lengths = vec![0u8; n];
392
393 // Collect used symbols.
394 let used: Vec<usize> = (0..n).filter(|&s| freqs[s] > 0).collect();
395 match used.len() {
396 0 => return lengths, // empty code; caller encodes single-symbol-0.
397 1 => {
398 // §3.7.2.1.2 single-leaf: one symbol marked length 1.
399 lengths[used[0]] = 1;
400 return lengths;
401 }
402 _ => {}
403 }
404
405 // Huffman build. Nodes 0..n are leaves; internal nodes n.. are
406 // appended in creation order. A parent array recovers the depth
407 // (= code length) of each leaf afterwards.
408 let m = used.len();
409
410 // Leaf queue: `(freq << 32) | symbol` keys, sorted ascending. The
411 // packed key makes the sort a single-u64 comparison while encoding
412 // exactly the `(freq, ascending symbol)` tie-break the merge needs
413 // (`freq` is `u32`, so the shift is exact, and a symbol index always
414 // fits the low half).
415 let mut leaves: Vec<u64> = used
416 .iter()
417 .map(|&s| ((freqs[s] as u64) << 32) | s as u64)
418 .collect();
419 leaves.sort_unstable();
420
421 // Internal-node FIFO: frequencies only; internal node `i` has node
422 // index `n + i`. `u32::MAX` marks "no parent yet" (only the root
423 // keeps it, and the root's slot is never read back).
424 let mut inode_freq: Vec<u64> = Vec::with_capacity(m - 1);
425 let mut parent: Vec<u32> = vec![u32::MAX; n + m - 1];
426
427 /// Take the smallest remaining node by `(freq, tie-break order)`:
428 /// the front leaf wins ties because leaves rank before internal
429 /// nodes in the tie-break order.
430 fn take_min(
431 leaves: &[u64],
432 li: &mut usize,
433 inode_freq: &[u64],
434 ii: &mut usize,
435 n: usize,
436 ) -> (usize, u64) {
437 let use_leaf = if *li < leaves.len() {
438 *ii >= inode_freq.len() || (leaves[*li] >> 32) <= inode_freq[*ii]
439 } else {
440 false
441 };
442 if use_leaf {
443 let key = leaves[*li];
444 *li += 1;
445 ((key & 0xffff_ffff) as usize, key >> 32)
446 } else {
447 let node = n + *ii;
448 let freq = inode_freq[*ii];
449 *ii += 1;
450 (node, freq)
451 }
452 }
453
454 let mut li = 0usize; // leaf cursor
455 let mut ii = 0usize; // internal-node cursor
456 for _ in 0..m - 1 {
457 let (a_node, a_freq) = take_min(&leaves, &mut li, &inode_freq, &mut ii, n);
458 let (b_node, b_freq) = take_min(&leaves, &mut li, &inode_freq, &mut ii, n);
459 let new_node = n + inode_freq.len();
460 parent[a_node] = new_node as u32;
461 parent[b_node] = new_node as u32;
462 inode_freq.push(a_freq + b_freq);
463 }
464
465 // Recover each leaf's depth top-down: an internal node's parent is
466 // always created later (larger index), so a single reverse pass over
467 // the internal nodes settles every internal depth, and each leaf is
468 // then one deeper than its (always internal) parent.
469 let mut internal_depth = vec![0u32; m - 1];
470 for i in (0..m - 1).rev() {
471 let p = parent[n + i];
472 if p != u32::MAX {
473 internal_depth[i] = internal_depth[p as usize - n] + 1;
474 }
475 }
476 let mut max_len = 0usize;
477 for &s in &used {
478 let depth = internal_depth[parent[s] as usize - n] as usize + 1;
479 // A single internal-node tree (two leaves) gives depth 1; never 0
480 // here because used.len() >= 2.
481 lengths[s] = depth as u8;
482 max_len = max_len.max(depth);
483 }
484
485 if max_len > MAX_CODE_LENGTH {
486 limit_code_lengths(&mut lengths, &used);
487 }
488
489 lengths
490}
491
492/// Cap every code length at [`MAX_CODE_LENGTH`] while keeping the Kraft sum
493/// exactly 1, using the standard "move a too-long leaf up and lengthen a
494/// short leaf to compensate" rebalancing pass.
495///
496/// This is the approach a length-limited Huffman post-pass uses when a
497/// pathological frequency distribution would otherwise need codes longer
498/// than the format allows. It produces a *valid* (complete) code that is at
499/// most marginally sub-optimal; exactness of the round trip is unaffected
500/// because the decoder reconstructs pixels from whatever complete code the
501/// lengths describe.
502fn limit_code_lengths(lengths: &mut [u8], used: &[usize]) {
503 limit_code_lengths_to(lengths, used, MAX_CODE_LENGTH);
504}
505
506/// As [`limit_code_lengths`], but caps every code length at the
507/// caller-supplied `max_len` rather than [`MAX_CODE_LENGTH`].
508///
509/// The §3.7.2.1.2 *code-length-code* (the meta-code that transmits the
510/// literal length table) writes each of its own lengths in a **3-bit**
511/// on-wire field, so its lengths must not exceed `7` — a constraint
512/// tighter than the 15-bit `MAX_CODE_LENGTH` ceiling that applies to the
513/// literal codes themselves. A skewed enough CLC frequency histogram
514/// (one length value vastly more common than the rest) makes the plain
515/// Huffman build assign a length-8-or-more code to a rare CLC symbol;
516/// without this cap the 3-bit field silently truncates it, corrupting the
517/// table into an incomplete (Kraft < 1) code the decoder rejects. Capping
518/// the CLC at 7 with a Kraft re-balance keeps the on-wire table valid.
519///
520/// `max_len <= MAX_CODE_LENGTH` is required (the Kraft arithmetic uses
521/// `2^max_len` as the common denominator).
522fn limit_code_lengths_to(lengths: &mut [u8], used: &[usize], max_len: usize) {
523 debug_assert!((1..=MAX_CODE_LENGTH).contains(&max_len));
524 // Clamp.
525 for &s in used {
526 if lengths[s] as usize > max_len {
527 lengths[s] = max_len as u8;
528 }
529 }
530 // Kraft sum over denominator 2^max_len.
531 let full: i64 = 1i64 << max_len;
532 let kraft = |lengths: &[u8]| -> i64 {
533 let mut k = 0i64;
534 for &s in used {
535 let l = lengths[s] as usize;
536 if l > 0 {
537 k += 1i64 << (max_len - l);
538 }
539 }
540 k
541 };
542 // If over-subscribed (sum > 1), lengthen the deepest (largest-length,
543 // i.e. cheapest-to-lengthen) leaves until the sum drops to 1.
544 //
545 // Selection rule being reproduced: the historical per-step rescan
546 // walked all of `used` and kept the LAST `used`-order symbol among
547 // those sharing the largest current length below the cap (the
548 // `l >= best_len` comparison kept updating on ties). Two facts turn
549 // that O(n)-per-adjustment rescan into an O(1)-per-adjustment bucket
550 // drain with the identical pick sequence:
551 //
552 // 1. A bucket per length, filled in one pass over `used`, holds each
553 // bucket's symbols in `used` order — so the back of the highest
554 // non-empty bucket IS the rescan's pick.
555 // 2. Once a pick is lengthened from `l` to `l + 1 < MAX`, it is
556 // strictly the unique deepest eligible leaf (everything else is
557 // `<= l`), so the rescan re-picks the same symbol every step
558 // until it reaches MAX (leaving the eligible set) or the sum
559 // reaches 1. Driving the popped symbol upward in place therefore
560 // replays the original step sequence exactly; no eligible bucket
561 // ever gains a member while the pass is still running.
562 let mut k = kraft(lengths);
563 if k > full {
564 let mut buckets: Vec<Vec<usize>> = vec![Vec::new(); max_len];
565 for &s in used {
566 let l = lengths[s] as usize;
567 if l < max_len {
568 buckets[l].push(s);
569 }
570 }
571 // Bucket 0 is included for parity with the historical rescan,
572 // which treated a (theoretical) zero-length used symbol as
573 // eligible; the §3.7.2 build never produces one for a used
574 // symbol, so the bucket is empty in practice.
575 'over: for l0 in (0..max_len).rev() {
576 while k > full {
577 let Some(s) = buckets[l0].pop() else { break };
578 // Lengthening from `l` to `l + 1` swaps the Kraft term
579 // `2^(max-l)` for `2^(max-l-1)`, i.e. removes exactly
580 // `2^(max-l-1)` — same integer a full recompute would
581 // give.
582 let mut l = l0;
583 while k > full && l < max_len {
584 l += 1;
585 lengths[s] = l as u8;
586 k -= 1i64 << (max_len - l);
587 }
588 }
589 if k <= full {
590 break 'over;
591 }
592 }
593 }
594 // If under-subscribed (sum < 1), shorten the deepest leaves until the
595 // sum reaches 1.
596 while k < full {
597 let mut target: Option<usize> = None;
598 let mut best_len = 0u8;
599 for &s in used {
600 let l = lengths[s];
601 if l > 1 && l >= best_len {
602 best_len = l;
603 target = Some(s);
604 }
605 }
606 match target {
607 Some(s) => {
608 // Shortening `s` from `l` to `l - 1` swaps `2^(max-l)` for
609 // `2^(max-l+1)`, i.e. adds exactly `2^(max-l)` — again the
610 // same integer a full recompute would give.
611 let l = lengths[s] as usize;
612 lengths[s] -= 1;
613 k += 1i64 << (max_len - l);
614 }
615 None => break,
616 }
617 }
618}
619
620/// Maximum on-wire length for a §3.7.2.1.2 code-length-code symbol: the
621/// CLC lengths are each written in a 3-bit field, so they range `[0..7]`.
622const MAX_CLC_CODE_LENGTH: usize = 7;
623
624/// Build the §3.7.2.1.2 code-length-code (CLC) lengths for a literal
625/// length table, capped at [`MAX_CLC_CODE_LENGTH`] so every length fits
626/// the 3-bit on-wire field. The plain Huffman build can assign a CLC
627/// symbol a length of 8 or more on a skewed histogram; this wrapper
628/// re-balances any such over-long code back under 7 while keeping the
629/// table complete, so both [`write_normal_code_lengths`] and
630/// [`normal_form_bits`] see the same valid lengths.
631fn build_clc_code_lengths(clc_freq: &[u32]) -> Vec<u8> {
632 let mut clc_lengths = build_code_lengths(clc_freq);
633 if clc_lengths
634 .iter()
635 .any(|&l| l as usize > MAX_CLC_CODE_LENGTH)
636 {
637 let used: Vec<usize> = (0..clc_freq.len()).filter(|&s| clc_freq[s] > 0).collect();
638 limit_code_lengths_to(&mut clc_lengths, &used, MAX_CLC_CODE_LENGTH);
639 }
640 clc_lengths
641}
642
643/// Build the canonical code values for a per-symbol length table.
644///
645/// Returns `codes[s]` = the canonical code value for symbol `s` (only
646/// meaningful where `lengths[s] > 0`). The assignment is the same DEFLATE
647/// canonical rule the decoder's [`crate::vp8l_prefix::PrefixCode`] reads:
648/// symbols ordered by `(length, value)`, codes assigned sequentially, read
649/// most-significant-bit-first within a code.
650pub fn canonical_codes(lengths: &[u8]) -> Vec<u32> {
651 let mut bl_count = [0u32; MAX_CODE_LENGTH + 1];
652 for &l in lengths {
653 if l > 0 {
654 bl_count[l as usize] += 1;
655 }
656 }
657 let mut next_code = [0u32; MAX_CODE_LENGTH + 2];
658 let mut code = 0u32;
659 for len in 1..=MAX_CODE_LENGTH {
660 code = (code + bl_count[len - 1]) << 1;
661 next_code[len] = code;
662 }
663 let mut codes = vec![0u32; lengths.len()];
664 let mut assign = next_code;
665 // Indexed by code length to assign sequential canonical codes; mirrors
666 // the decoder's `(length, value)`-ordered assignment.
667 #[allow(clippy::needless_range_loop)]
668 for len in 1..=MAX_CODE_LENGTH {
669 for (sym, &l) in lengths.iter().enumerate() {
670 if l as usize == len {
671 codes[sym] = assign[len];
672 assign[len] += 1;
673 }
674 }
675 }
676 codes
677}
678
679/// §5.2.2: split a length/distance `value` (≥ 1) into its *prefix code* and
680/// *extra bits*, the exact inverse of the decoder's
681/// [`crate::vp8l_decode::read_lz77_value`].
682///
683/// Returns `(prefix_code, extra_bits, extra_value)` where:
684///
685/// * `prefix_code` is the entropy-coded symbol (a GREEN length symbol is
686/// `256 + prefix_code`; a distance symbol is `prefix_code` directly),
687/// * `extra_bits` is how many raw bits follow the prefix code,
688/// * `extra_value` is the value those `extra_bits` carry (LSB-first, as the
689/// decoder's `ReadBits` consumes them).
690///
691/// The decoder reconstructs `value` as:
692///
693/// ```text
694/// if prefix_code < 4 { value = prefix_code + 1 }
695/// else {
696/// extra_bits = (prefix_code - 2) >> 1
697/// offset = (2 + (prefix_code & 1)) << extra_bits
698/// value = offset + extra_value + 1
699/// }
700/// ```
701///
702/// so feeding `extra_value` back through that formula yields `value`.
703pub fn value_to_prefix(value: u32) -> (u32, u32, u32) {
704 debug_assert!(value >= 1, "LZ77 length/distance values are 1-based");
705 if value <= 4 {
706 // prefix_code = value - 1; no extra bits (the `< 4` decoder branch).
707 return (value - 1, 0, 0);
708 }
709 // value >= 5. Find the prefix code p (>= 4) whose range
710 // [offset+1, offset + 2^extra_bits] contains `value`, where
711 // extra_bits = (p - 2) >> 1 and offset = (2 + (p & 1)) << extra_bits.
712 //
713 // Equivalently: let v0 = value - 1 (>= 4). The high bit of v0 selects
714 // the magnitude; the next bit selects the (p & 1) parity sub-band.
715 let v0 = value - 1; // >= 4
716 // `msb` = floor(log2(v0)) >= 2.
717 let msb = 31 - v0.leading_zeros();
718 let extra_bits = msb - 1;
719 // Parity bit: the bit just below the MSB distinguishes the two
720 // sub-bands offset = 2<<e (parity 0) vs offset = 3<<e (parity 1).
721 let parity = (v0 >> (msb - 1)) & 1;
722 let prefix_code = 2 * extra_bits + 2 + parity;
723 let offset = (2 + parity) << extra_bits;
724 let extra_value = value - offset - 1;
725 debug_assert!(extra_value < (1u32 << extra_bits));
726 (prefix_code, extra_bits, extra_value)
727}
728
729/// A built prefix code ready for symbol emission: per-symbol length + code.
730#[derive(Debug, Clone)]
731struct WriteCode {
732 lengths: Vec<u8>,
733 codes: Vec<u32>,
734 /// `Some(sym)` when this is the single-leaf form (one symbol, length 1).
735 single: Option<usize>,
736}
737
738impl WriteCode {
739 /// Build a [`WriteCode`] from symbol frequencies over an alphabet of
740 /// `alphabet_size` symbols.
741 fn from_freqs(freqs: &[u32]) -> Self {
742 let used: Vec<usize> = (0..freqs.len()).filter(|&s| freqs[s] > 0).collect();
743 let single = if used.len() == 1 { Some(used[0]) } else { None };
744 let lengths = build_code_lengths(freqs);
745 let codes = canonical_codes(&lengths);
746 Self {
747 lengths,
748 codes,
749 single,
750 }
751 }
752
753 /// An *empty* code: encoded per §3.7.2.1.1's note as a single symbol 0.
754 /// Used for the distance code when no backward references are emitted.
755 fn empty(alphabet_size: usize) -> Self {
756 let mut freqs = vec![0u32; alphabet_size];
757 freqs[0] = 1;
758 Self::from_freqs(&freqs)
759 }
760
761 /// Emit one symbol's code to `w` (MSB-first within the code, matching
762 /// the canonical assignment the decoder reads). For the single-leaf
763 /// form this writes nothing (reading consumes no bits).
764 fn write_symbol(&self, w: &mut BitWriter, symbol: usize) {
765 if self.single.is_some() {
766 return; // single-leaf code: 0 bits.
767 }
768 let len = self.lengths[symbol] as usize;
769 let code = self.codes[symbol];
770 // The decoder reads MSB-first within the code, so emit the high bit
771 // first. write_bits is LSB-first, so reverse the `len` low bits.
772 for i in 0..len {
773 let bit = (code >> (len - 1 - i)) & 1;
774 w.write_bits(bit, 1);
775 }
776 }
777
778 /// Write this code's per-symbol lengths to `w`, picking the cheaper
779 /// of the two §3.7.2.1 forms.
780 ///
781 /// The §3.7.2.1.1 *simple code length code* can only represent length
782 /// tables with 1 or 2 symbols at length 1 (every other symbol
783 /// implicitly absent). When that constraint holds, `write_code_lengths`
784 /// computes the precise bit-cost of both forms and picks the smaller.
785 /// Otherwise it falls back to the §3.7.2.1.2 *normal code length code*.
786 fn write_code_lengths(&self, w: &mut BitWriter) {
787 if let Some(simple) = self.as_simple_form() {
788 // Two trivial cases the simple form can carry — compare
789 // bit-costs and pick the cheaper.
790 let simple_bits = simple_form_bits(&simple);
791 let normal_bits = normal_form_bits(&self.lengths);
792 if simple_bits <= normal_bits {
793 write_simple_code_lengths(w, &simple);
794 return;
795 }
796 }
797 write_normal_code_lengths(w, &self.lengths);
798 }
799
800 /// If this code's length table is encodable with the §3.7.2.1.1 simple
801 /// form (1 or 2 symbols at length 1, all others 0), return the symbol
802 /// list `[symbol0]` or `[symbol0, symbol1]`. Otherwise return `None`.
803 fn as_simple_form(&self) -> Option<Vec<usize>> {
804 let used: Vec<(usize, u8)> = self
805 .lengths
806 .iter()
807 .enumerate()
808 .filter_map(|(s, &l)| if l != 0 { Some((s, l)) } else { None })
809 .collect();
810 // Simple form requires 1 or 2 used symbols, each at length 1.
811 // §3.7.2.1.1: "code length 1. All other prefix code lengths are
812 // implicitly zeros."
813 if used.is_empty() || used.len() > 2 {
814 return None;
815 }
816 if used.iter().any(|&(_, l)| l != 1) {
817 return None;
818 }
819 // §3.7.2.1.1 first symbol is coded with 1 or 8 bits, so it must
820 // fit in [0..255]; second symbol always 8 bits, [0..255]. Anything
821 // beyond 255 can only be sent via the normal form.
822 if used.iter().any(|&(s, _)| s > 255) {
823 return None;
824 }
825 Some(used.iter().map(|&(s, _)| s).collect())
826 }
827}
828
829/// Precise bit-cost of the §3.7.2.1.1 *simple code length code* for the
830/// given symbol list (1 or 2 entries, each in `[0..255]`).
831///
832/// Layout per §3.7.2.1.1:
833/// * 1 flag bit (`1` = simple)
834/// * 1 bit `num_symbols - 1`
835/// * 1 bit `is_first_8bits` (chooses 1-bit vs 8-bit width for symbol0)
836/// * `1 + 7 * is_first_8bits` bits for `symbol0`
837/// * if `num_symbols == 2`: 8 bits for `symbol1`
838fn simple_form_bits(symbols: &[usize]) -> usize {
839 debug_assert!(symbols.len() == 1 || symbols.len() == 2);
840 let is_first_8bits = symbols[0] > 1;
841 // Per spec: the second symbol, when present, is always 8 bits.
842 let s0_width = if is_first_8bits { 8 } else { 1 };
843 let s1_width = if symbols.len() == 2 { 8 } else { 0 };
844 // 1 (flag) + 1 (num_symbols-1) + 1 (is_first_8bits) + s0 + s1.
845 3 + s0_width + s1_width
846}
847
848/// Precise bit-cost of [`write_normal_code_lengths`] for `lengths`.
849///
850/// Mirrors `write_normal_code_lengths` exactly so the chooser is
851/// self-consistent: any change in normal-form layout there must reflect
852/// here.
853fn normal_form_bits(lengths: &[u8]) -> usize {
854 // CLC frequencies are the histogram of length values 0..=15 in the
855 // literal length table.
856 let mut clc_freq = [0u32; NUM_CODE_LENGTH_CODES];
857 for &l in lengths {
858 clc_freq[l as usize] += 1;
859 }
860 let clc_lengths = build_clc_code_lengths(&clc_freq);
861
862 // Locate the highest-ordered CLC symbol that has a non-zero length.
863 let mut max_order_used = 0usize;
864 for (order_idx, &pos) in CODE_LENGTH_CODE_ORDER.iter().enumerate() {
865 if clc_lengths[pos] != 0 {
866 max_order_used = order_idx;
867 }
868 }
869 let num_code_lengths = (max_order_used + 1).max(4);
870
871 // §3.7.2.1.2 header tax: 1 flag + 4 num_code_lengths + 3*num_code_lengths
872 // CLC lengths + 1 max_symbol gate.
873 let mut bits = 1 + 4 + 3 * num_code_lengths + 1;
874
875 // Per-symbol body: when the CLC collapses to a single non-zero
876 // length (single-leaf CLC), the decoder consumes 0 bits per symbol
877 // and the writer emits nothing. Otherwise emit the canonical code for
878 // each literal length value.
879 let used_clc: Vec<usize> = (0..NUM_CODE_LENGTH_CODES)
880 .filter(|&s| clc_freq[s] > 0)
881 .collect();
882 if used_clc.len() > 1 {
883 for &l in lengths {
884 bits += clc_lengths[l as usize] as usize;
885 }
886 }
887 bits
888}
889
890/// Write a per-symbol length table with the §3.7.2.1.1 *simple code
891/// length code*.
892///
893/// Only valid for `symbols.len()` in `[1, 2]`, each symbol in `[0..255]`,
894/// each implicitly at code length 1. The caller is responsible for
895/// checking applicability via [`WriteCode::as_simple_form`].
896fn write_simple_code_lengths(w: &mut BitWriter, symbols: &[usize]) {
897 debug_assert!(symbols.len() == 1 || symbols.len() == 2);
898 debug_assert!(symbols.iter().all(|&s| s <= 255));
899
900 // §3.7.2.1.1 flag: 1 selects the simple form.
901 w.write_bit(true);
902 // num_symbols = ReadBits(1) + 1, so write `num_symbols - 1`.
903 w.write_bits((symbols.len() as u32) - 1, 1);
904 // §3.7.2.1.1: "is_first_8bits ... range [0..1] or [0..255]". Choose
905 // the 1-bit form when symbol0 fits in [0..1], else the 8-bit form.
906 let is_first_8bits = symbols[0] > 1;
907 w.write_bits(if is_first_8bits { 1 } else { 0 }, 1);
908 let s0_width = if is_first_8bits { 8 } else { 1 };
909 w.write_bits(symbols[0] as u32, s0_width);
910 if symbols.len() == 2 {
911 // §3.7.2.1.1: "The second symbol, if present, is always assumed
912 // to be in the range [0..255] and coded using 8 bits."
913 w.write_bits(symbols[1] as u32, 8);
914 }
915}
916
917/// Write a per-symbol length table with the §3.7.2.1.2 *normal code length
918/// code*.
919///
920/// The encoder uses the general (non-run-length) form: it transmits one
921/// code-length-code symbol per literal length. To keep the code-length-code
922/// itself trivially decodable, every length value `0..=15` that actually
923/// occurs is given a code-length-code symbol; the CLC is built from the
924/// frequencies of those length values. Runs (codes 16/17/18) are not
925/// emitted — the literal length sequence is sent verbatim, which the
926/// decoder's `read_normal_code_lengths` handles as the `0..=15` literal
927/// branch.
928fn write_normal_code_lengths(w: &mut BitWriter, lengths: &[u8]) {
929 // §3.7.2.1.2: the code-length-code is itself a prefix code over the
930 // 19-symbol alphabet {0..15 literal lengths, 16 repeat, 17/18 zero
931 // runs}. We only emit symbols 0..=15 (no runs), so the CLC alphabet is
932 // those length values that occur in `lengths`.
933 let mut clc_freq = [0u32; NUM_CODE_LENGTH_CODES];
934 for &l in lengths {
935 clc_freq[l as usize] += 1;
936 }
937 let clc_lengths = build_clc_code_lengths(&clc_freq);
938 let clc_codes = canonical_codes(&clc_lengths);
939
940 // num_code_lengths: how many CLC lengths we transmit, in
941 // kCodeLengthCodeOrder. We must transmit enough leading entries to
942 // cover the highest-ordered CLC symbol that has a non-zero length.
943 let mut max_order_used = 0usize;
944 for (order_idx, &pos) in CODE_LENGTH_CODE_ORDER.iter().enumerate() {
945 if clc_lengths[pos] != 0 {
946 max_order_used = order_idx;
947 }
948 }
949 // §3.7.2.1.2: num_code_lengths = 4 + ReadBits(4), range [4..19].
950 let num_code_lengths = (max_order_used + 1).max(4);
951
952 // normal flag bit.
953 w.write_bit(false);
954 // num_code_lengths - 4 in 4 bits.
955 w.write_bits((num_code_lengths - 4) as u32, 4);
956 // The CLC lengths, 3 bits each, in kCodeLengthCodeOrder.
957 for &pos in CODE_LENGTH_CODE_ORDER.iter().take(num_code_lengths) {
958 w.write_bits(clc_lengths[pos] as u32, 3);
959 }
960 // max_symbol gate: ReadBits(1) == 0 → max_symbol = alphabet_size, i.e.
961 // read all `lengths.len()` entries. We always emit the full table.
962 w.write_bit(false);
963
964 // Whether the CLC is a single-leaf code (one length value occurs):
965 // write_symbol then emits 0 bits, and the decoder's CLC reader returns
966 // that lone symbol for every read — which is exactly the literal length
967 // we want, repeated for every symbol. Build a tiny symbol writer.
968 let clc_single = {
969 let used: Vec<usize> = (0..NUM_CODE_LENGTH_CODES)
970 .filter(|&s| clc_freq[s] > 0)
971 .collect();
972 if used.len() == 1 {
973 Some(used[0])
974 } else {
975 None
976 }
977 };
978
979 // Emit one CLC symbol per literal length (the `0..=15` branch).
980 for &l in lengths {
981 let sym = l as usize;
982 if clc_single.is_some() {
983 continue; // single-leaf CLC: 0 bits per symbol.
984 }
985 let code = clc_codes[sym];
986 let len = clc_lengths[sym] as usize;
987 for i in 0..len {
988 let bit = (code >> (len - 1 - i)) & 1;
989 w.write_bits(bit, 1);
990 }
991 }
992}
993
994/// Smallest backward-reference run (in pixels) the matcher will emit. A
995/// match of fewer than this many pixels rarely pays for the length +
996/// distance prefix codes versus emitting the pixels as literals, so short
997/// runs stay literal.
998pub const MIN_MATCH: usize = 3;
999
1000/// Largest backward-reference run the §5.2.2 length prefix coding admits
1001/// (the spec note: "The maximum backward reference length is limited to
1002/// 4096."). A longer repeat is split into consecutive matches.
1003pub const MAX_MATCH: usize = 4096;
1004
1005/// Number of low bits of the rolling pixel hash → hash-chain head buckets.
1006/// `1 << HASH_BITS` heads; collisions are resolved by walking the chain.
1007const HASH_BITS: usize = 14;
1008/// Cap on chain steps walked per position, bounding the matcher's worst
1009/// case on adversarial inputs while keeping the common-case match quality.
1010const MAX_CHAIN: usize = 64;
1011
1012/// A single emitted token in the §5.2.2 LZ77 stream: either a raw ARGB
1013/// pixel (a §5.2.1 literal), a §5.2.3 color-cache reference, or a
1014/// §5.2.2 backward-reference copy.
1015#[derive(Debug, Clone, Copy, PartialEq, Eq)]
1016enum Token {
1017 /// A §5.2.1 ARGB literal pixel (encoded as four channel symbols).
1018 Literal(u32),
1019 /// A §5.2.3 color-cache reference. `index` is the resolved
1020 /// cache slot (the green symbol on the wire is
1021 /// `256 + 24 + index`).
1022 CacheRef {
1023 /// The hashed cache index (`0..color_cache_size`).
1024 index: u32,
1025 },
1026 /// A §5.2.2 backward reference: copy `length` pixels from `distance`
1027 /// pixels back in scan-line order.
1028 Copy {
1029 /// Copy length in pixels (`MIN_MATCH..=MAX_MATCH`).
1030 length: usize,
1031 /// Scan-line pixel distance back to the copy source (`>= 1`).
1032 distance: usize,
1033 },
1034}
1035
1036/// §5.2.2 hash-chain matcher over a scan-line ARGB pixel buffer.
1037///
1038/// Hashes 4-pixel windows into `1 << HASH_BITS` buckets and chains every
1039/// position sharing a hash, so a match search at position `p` walks only
1040/// positions that begin with the same 4-pixel hash. This is the standard
1041/// LZ77 greedy match structure; it finds repeated pixel runs without ever
1042/// consulting any external implementation — the only correctness contract
1043/// is that an emitted `Copy { length, distance }` is reproducible by the
1044/// decoder's §5.2.2 copy loop, which it is for any `1 <= distance <= p` and
1045/// `length <= remaining`.
1046struct Lz77Matcher<'a> {
1047 pixels: &'a [u32],
1048 head: Vec<i32>,
1049 prev: Vec<i32>,
1050}
1051
1052impl<'a> Lz77Matcher<'a> {
1053 /// Build a matcher over `pixels` with empty hash chains.
1054 fn new(pixels: &'a [u32]) -> Self {
1055 Self {
1056 pixels,
1057 head: vec![-1; 1 << HASH_BITS],
1058 prev: vec![-1; pixels.len()],
1059 }
1060 }
1061
1062 /// Hash the 4-pixel window starting at `pos` (callers guarantee
1063 /// `pos + 4 <= pixels.len()`). A simple multiplicative mix over the
1064 /// four ARGB words, folded into `HASH_BITS` bits.
1065 fn hash(&self, pos: usize) -> usize {
1066 let p = self.pixels;
1067 let mut h = 0u32;
1068 for k in 0..4 {
1069 h = h.wrapping_mul(0x9e37_79b1).wrapping_add(p[pos + k]);
1070 }
1071 (h >> (32 - HASH_BITS)) as usize
1072 }
1073
1074 /// Insert `pos` at the head of its hash bucket's chain.
1075 fn insert(&mut self, pos: usize) {
1076 if pos + 4 > self.pixels.len() {
1077 return;
1078 }
1079 let h = self.hash(pos);
1080 self.prev[pos] = self.head[h];
1081 self.head[h] = pos as i32;
1082 }
1083
1084 /// Find the longest match for the window at `pos`, returning
1085 /// `Some((length, distance))` when a run of `>= MIN_MATCH` pixels is
1086 /// found. Walks at most [`MAX_CHAIN`] chain links.
1087 ///
1088 /// The matcher hashes 4-pixel windows, so a match search requires
1089 /// `pos + 4 <= pixels.len()`. The tail of the image (fewer than 4
1090 /// pixels remaining) is always emitted as literals.
1091 fn find(&self, pos: usize) -> Option<(usize, usize)> {
1092 let p = self.pixels;
1093 let n = p.len();
1094 if pos + 4 > n {
1095 return None;
1096 }
1097 let max_len = (n - pos).min(MAX_MATCH);
1098 let h = self.hash(pos);
1099 let mut cand = self.head[h];
1100 let mut best_len = 0usize;
1101 let mut best_dist = 0usize;
1102 let mut steps = 0usize;
1103 while cand >= 0 && steps < MAX_CHAIN {
1104 let c = cand as usize;
1105 // Candidates were all inserted at positions < pos.
1106 let mut len = 0usize;
1107 while len < max_len && p[c + len] == p[pos + len] {
1108 len += 1;
1109 }
1110 if len > best_len {
1111 best_len = len;
1112 best_dist = pos - c;
1113 if len >= max_len {
1114 break;
1115 }
1116 }
1117 cand = self.prev[c];
1118 steps += 1;
1119 }
1120 if best_len >= MIN_MATCH {
1121 Some((best_len, best_dist))
1122 } else {
1123 None
1124 }
1125 }
1126}
1127
1128/// Run the §5.2.2 hash-chain matcher over `pixels`, producing the
1129/// token stream (literals + backward-reference copies) the entropy
1130/// stage emits. Every `Copy` token has `1 <= distance <= position` and
1131/// `MIN_MATCH <= length <= MAX_MATCH`, so the decoder's §5.2.2 copy
1132/// loop reproduces the exact pixels.
1133///
1134/// As of round 158 the matcher applies **three-position lazy matching**:
1135/// when the matcher finds a match `(len_a, _)` at `pos`, the encoder
1136/// also probes `pos + 1` (depth-1), `pos + 2` (depth-2), and `pos + 3`
1137/// (depth-3). The longest of `(len_a, len_b, len_c, len_d)` wins; ties
1138/// resolve to the earliest position (preserving the strict-greater
1139/// semantics introduced in round 156). When the depth-3 match `len_d`
1140/// is the unique longest, the encoder emits *three* literals (at
1141/// `pos`, `pos + 1`, `pos + 2`) and takes the longer match starting
1142/// at `pos + 3`. This costs at most three extra hash-chain walks per
1143/// match attempt and extends the round-157 two-position lazy recovery
1144/// to the third-order trap: a short match at each of `pos`, `pos + 1`,
1145/// `pos + 2` together blocking a strictly longer match at `pos + 3`.
1146/// The reconstructed pixels are bit-identical to the strict-greedy,
1147/// depth-1, and depth-2 partitions for any input — only the token
1148/// *partition* shifts by up to three pixels — so round-trips remain
1149/// bit-exact and the existing test suite continues to pass.
1150fn tokenize_lz77(pixels: &[u32]) -> Vec<Token> {
1151 tokenize_lz77_inner(pixels, LAZY_DEPTH_DEFAULT)
1152}
1153
1154/// Production lazy-match depth used by [`tokenize_lz77`]. Round 156
1155/// set this to 1 (single-position look-ahead); round 157 bumped it to
1156/// 2 (two-position look-ahead); round 158 bumped it to 3 (three-
1157/// position look-ahead); round 163 bumps it to 4 (four-position look-
1158/// ahead with a [`DEPTH4_GUARD_THRESHOLD`] diminishing-returns guard).
1159/// A value of 0 reproduces the r155 strict-greedy partition.
1160const LAZY_DEPTH_DEFAULT: u32 = 4;
1161
1162/// Round-163 diminishing-returns guard for the depth-4 probe. The
1163/// depth-4 `find(pos + 4)` call (plus the `matcher.insert(pos + 3)`
1164/// bookkeeping that gives it a fair shot at including `pos..=pos + 3`
1165/// in its window) is only executed when the running best length
1166/// across the depth-1/2/3 probes is strictly less than this
1167/// threshold. Once the depth-3 best already covers a length-
1168/// `THRESHOLD` run, swapping to a depth-4 alternative would have to
1169/// strictly exceed that length while paying for four literals
1170/// (`pixels[pos..pos + 4]`); the empirical pay-off shrinks rapidly
1171/// past the threshold and is rarely big enough to recover the
1172/// literal-emission cost in the entropy stage. Tuned to a conservative
1173/// value (`6`) so the guard only suppresses depth-4 work when the
1174/// running best is already comfortably above the four-literal break-
1175/// even line. At `THRESHOLD = u32::MAX` the depth-4 probe still
1176/// honours the `best_len > MIN_MATCH` floor (see
1177/// [`tokenize_lz77_inner`]); at `THRESHOLD = 0` (or below
1178/// `MIN_MATCH + 1 = 4`) the depth-4 probe never fires. The A/B
1179/// regression test [`round_163_depth4_guard_suppresses_long_run_swap`]
1180/// exercises the guard's switching boundary.
1181const DEPTH4_GUARD_THRESHOLD: u32 = 6;
1182
1183/// Implementation of [`tokenize_lz77`] with an explicit `lazy_depth`
1184/// toggle. Values:
1185///
1186/// * `0` — strict-greedy r155 partition (no look-ahead). Always emits
1187/// the match found at `pos`.
1188/// * `1` — round-156 single-position lazy partition: probe `pos + 1`,
1189/// swap to a strictly-longer match starting there.
1190/// * `2` — round-157 two-position lazy partition: also probe
1191/// `pos + 2`, swap to a strictly-longer match starting there (the
1192/// `pos + 2` match must strictly beat both `pos` and `pos + 1`).
1193/// * `3` — round-158 three-position lazy partition: also probe
1194/// `pos + 3`, swap to a strictly-longer match starting there (the
1195/// `pos + 3` match must strictly beat the running best across
1196/// `pos`, `pos + 1`, and `pos + 2`).
1197/// * `4` — round-163 guarded four-position lazy partition: also
1198/// probes `pos + 4`, but **only when** the running best across the
1199/// first four positions is strictly greater than [`MIN_MATCH`]
1200/// (`MIN_MATCH = 3`, so `best_len >= 4`) AND strictly less than
1201/// [`DEPTH4_GUARD_THRESHOLD`]. The `> MIN_MATCH` floor ensures the
1202/// pre-inserted `pos + 3` position is always covered by the chosen
1203/// match's range, so the next iteration's `find` never sees its
1204/// own position in the chain. When the guard fires, the `pos + 4`
1205/// match must strictly beat the running best.
1206///
1207/// Values `>= 4` are clamped to `4`. The A/B regression tests
1208/// in this module use `0`, `1`, `2`, and `3` to compare against the
1209/// r155, r156, r157, and r158 baselines.
1210fn tokenize_lz77_inner(pixels: &[u32], lazy_depth: u32) -> Vec<Token> {
1211 let n = pixels.len();
1212 let mut matcher = Lz77Matcher::new(pixels);
1213 let mut tokens = Vec::new();
1214 let mut pos = 0usize;
1215 let depth = lazy_depth.min(4);
1216 while pos < n {
1217 if let Some((len_a, dist_a)) = matcher.find(pos) {
1218 // Lazy lookahead. The matcher's hash chains do not yet
1219 // include `pos` (matches at `pos` only reference positions
1220 // strictly before `pos`), so to give the `pos + 1` probe a
1221 // fair shot at a match that *includes* the pixel at `pos`
1222 // we insert `pos` into the chains before the look-ahead
1223 // `find`. Likewise, the `pos + 2` probe needs both `pos`
1224 // and `pos + 1` in the chains, and the `pos + 3` probe
1225 // needs `pos`, `pos + 1`, and `pos + 2` all in. The
1226 // bookkeeping at the tail of each branch skips
1227 // re-inserting any positions that the lookahead probes
1228 // already inserted.
1229 let mut best_len = len_a;
1230 let mut best_dist = dist_a;
1231 let mut best_start = pos; // pixel index where the match begins
1232 let inserted_pos = depth >= 1 && len_a < MAX_MATCH && pos + 1 < n;
1233 if inserted_pos {
1234 matcher.insert(pos);
1235 if let Some((len_b, dist_b)) = matcher.find(pos + 1) {
1236 if len_b > best_len {
1237 best_len = len_b;
1238 best_dist = dist_b;
1239 best_start = pos + 1;
1240 }
1241 }
1242 }
1243 // Depth-2 probe: only meaningful if depth allows it, the
1244 // current best match is short enough to be worth
1245 // attempting to displace, and `pos + 2` is in range. We
1246 // also require `pos + 1` to be inserted so the `pos + 2`
1247 // window can reference it; the depth-1 probe already
1248 // inserted `pos`.
1249 let inserted_pos1 = depth >= 2 && best_len < MAX_MATCH && pos + 2 < n;
1250 if inserted_pos1 {
1251 matcher.insert(pos + 1);
1252 if let Some((len_c, dist_c)) = matcher.find(pos + 2) {
1253 if len_c > best_len {
1254 best_len = len_c;
1255 best_dist = dist_c;
1256 best_start = pos + 2;
1257 }
1258 }
1259 }
1260 // Depth-3 probe: only meaningful if depth allows it, the
1261 // running best match is short enough to be worth
1262 // attempting to displace, and `pos + 3` is in range. We
1263 // also require `pos + 2` to be inserted so the `pos + 3`
1264 // window can reference it; the depth-1 / depth-2 probes
1265 // already inserted `pos` and `pos + 1`.
1266 let inserted_pos2 = depth >= 3 && best_len < MAX_MATCH && pos + 3 < n;
1267 if inserted_pos2 {
1268 matcher.insert(pos + 2);
1269 if let Some((len_d, dist_d)) = matcher.find(pos + 3) {
1270 if len_d > best_len {
1271 best_len = len_d;
1272 best_dist = dist_d;
1273 best_start = pos + 3;
1274 }
1275 }
1276 }
1277 // Depth-4 probe (round 163): only meaningful if depth
1278 // allows it, the running best match is short enough to be
1279 // worth attempting to displace, `pos + 4` is in range,
1280 // AND the round-163 diminishing-returns guard fires
1281 // (`best_len < DEPTH4_GUARD_THRESHOLD`). The guard skips
1282 // the depth-4 work when the depth-3 best is already
1283 // comfortably above the four-literal break-even line.
1284 //
1285 // Additional **lower-bound** floor: the depth-4 probe pre-
1286 // inserts `pos + 3` into the matcher chain so the `find(pos
1287 // + 4)` window can reference it. That pre-insert must be
1288 // covered by the chosen match's range `[best_start,
1289 // best_start + best_len)` — otherwise the next iteration's
1290 // `pos` (= `best_start + best_len`) could equal `pos + 3`,
1291 // and `find(pos + 3)` would see itself in the chain and
1292 // return distance `0`. We avoid that corner by gating on
1293 // `best_len > MIN_MATCH` (i.e., `best_len >= 4`): with
1294 // `best_start == pos` the match end is at least `pos + 4 >
1295 // pos + 3`, covering the pre-insert. The depth-3 best of
1296 // exactly 3 pixels (`= MIN_MATCH`) is short enough that
1297 // the depth-4 probe is rarely worth it anyway, so the
1298 // floor costs almost nothing on the matcher's behaviour.
1299 //
1300 // We also require `pos + 3` to be inserted so the `pos + 4`
1301 // window can reference it; the depth-1 / depth-2 / depth-3
1302 // probes already inserted `pos`, `pos + 1`, and `pos + 2`.
1303 let inserted_pos3 = depth >= 4
1304 && best_len > MIN_MATCH
1305 && best_len < MAX_MATCH
1306 && (best_len as u32) < DEPTH4_GUARD_THRESHOLD
1307 && pos + 4 < n;
1308 if inserted_pos3 {
1309 matcher.insert(pos + 3);
1310 if let Some((len_e, dist_e)) = matcher.find(pos + 4) {
1311 if len_e > best_len {
1312 best_len = len_e;
1313 best_dist = dist_e;
1314 best_start = pos + 4;
1315 }
1316 }
1317 }
1318
1319 // Emit literals for any pixels skipped by the chosen
1320 // lazy starting position, then the chosen match.
1321 for &skipped in &pixels[pos..best_start] {
1322 tokens.push(Token::Literal(skipped));
1323 }
1324 tokens.push(Token::Copy {
1325 length: best_len,
1326 distance: best_dist,
1327 });
1328
1329 // Hash-chain bookkeeping. Insert every covered position
1330 // into the chains so later matches can reference inside
1331 // the just-copied run; skip positions that the lookahead
1332 // probes already inserted.
1333 //
1334 // Pre-inserted positions (so far): `pos` if `inserted_pos`,
1335 // `pos + 1` if `inserted_pos1`, `pos + 2` if `inserted_pos2`,
1336 // `pos + 3` if `inserted_pos3` (round 163). The chosen
1337 // match covers `[best_start, best_start + best_len)`. Walk
1338 // that range and only `insert` the positions that are not
1339 // already in the chains.
1340 let end = best_start + best_len;
1341 let mut q = pos;
1342 while q < end {
1343 let already_in = (q == pos && inserted_pos)
1344 || (q == pos + 1 && inserted_pos1)
1345 || (q == pos + 2 && inserted_pos2)
1346 || (q == pos + 3 && inserted_pos3);
1347 if q >= best_start && !already_in {
1348 matcher.insert(q);
1349 }
1350 q += 1;
1351 }
1352 pos = end;
1353 } else {
1354 tokens.push(Token::Literal(pixels[pos]));
1355 matcher.insert(pos);
1356 pos += 1;
1357 }
1358 }
1359 tokens
1360}
1361
1362/// Allowed range for the §5.2.3 `color_cache_code_bits` field: an
1363/// enabled cache has `code_bits ∈ [1, 11]`, giving a cache size of
1364/// `2..=2048` entries. Mirrors
1365/// [`crate::meta_prefix::COLOR_CACHE_BITS_MIN`] /
1366/// [`crate::meta_prefix::COLOR_CACHE_BITS_MAX`].
1367pub const COLOR_CACHE_BITS_MIN: u32 = 1;
1368/// See [`COLOR_CACHE_BITS_MIN`].
1369pub const COLOR_CACHE_BITS_MAX: u32 = 11;
1370
1371/// The default `color_cache_code_bits` the chooser evaluates when a
1372/// caller asks for a single representative cache size (e.g. test
1373/// fixtures, the `encode_argb_literals_color_cache` direct entry).
1374/// Eight bits gives a 256-entry cache — a middle-of-range value that
1375/// works reasonably well across the §5.2.3 `[1..11]` range.
1376///
1377/// The production chooser ([`encode_argb_literals_with_width`] and
1378/// [`encode_argb_with_predictor_chooser`]) no longer uses this single
1379/// value: as of round 148 it sweeps every `cache_code_bits ∈ [1..11]`
1380/// per the §5.2.3 range and emits the smallest stream. See
1381/// [`select_best_cache_bits`].
1382pub const DEFAULT_COLOR_CACHE_BITS: u32 = 8;
1383
1384/// §5.2.3 color-cache helper used by the encoder. Mirrors the decoder's
1385/// [`crate::vp8l_decode::ColorCache`] semantics: an array of
1386/// `1 << code_bits` ARGB entries, all initialized to zero, with a
1387/// hashed lookup `(0x1e35a7bd * argb) >> (32 - code_bits)`.
1388///
1389/// The encoder maintains the cache in stream order — exactly as the
1390/// decoder will when re-walking the emitted symbols — so a slot's
1391/// state matches between writer and reader at every bit position. A
1392/// §5.2.3 `CacheRef { index }` token is emitted *only* when
1393/// `lookup(index) == Some(argb)` at the moment the token is produced;
1394/// the decoder will read the same index and produce the same ARGB.
1395#[derive(Debug, Clone)]
1396struct EncoderColorCache {
1397 code_bits: u32,
1398 entries: Vec<u32>,
1399}
1400
1401impl EncoderColorCache {
1402 /// Allocate a fresh `1 << code_bits`-entry cache. `code_bits` must
1403 /// be in `[COLOR_CACHE_BITS_MIN, COLOR_CACHE_BITS_MAX]`; debug
1404 /// builds assert.
1405 fn new(code_bits: u32) -> Self {
1406 debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&code_bits));
1407 Self {
1408 code_bits,
1409 entries: vec![0u32; 1usize << code_bits],
1410 }
1411 }
1412
1413 /// `1 << code_bits` — the §5.2.3 cache size.
1414 #[cfg(test)]
1415 fn size(&self) -> usize {
1416 self.entries.len()
1417 }
1418
1419 /// §5.2.3: `(0x1e35a7bd * argb) >> (32 - code_bits)`. Identical to
1420 /// the decoder's [`crate::vp8l_decode::ColorCache::hash`].
1421 fn hash(&self, argb: u32) -> usize {
1422 (crate::vp8l_decode::COLOR_CACHE_HASH_MULTIPLIER.wrapping_mul(argb)
1423 >> (32 - self.code_bits)) as usize
1424 }
1425
1426 /// `true` when the slot for `argb`'s hash currently holds `argb`
1427 /// itself — i.e. emitting a `CacheRef { index: hash(argb) }`
1428 /// token would round-trip to the same pixel on decode.
1429 fn contains(&self, argb: u32) -> Option<usize> {
1430 let idx = self.hash(argb);
1431 if self.entries[idx] == argb {
1432 Some(idx)
1433 } else {
1434 None
1435 }
1436 }
1437
1438 /// Insert `argb` at its hashed slot (§5.2.3: every emitted pixel,
1439 /// literal or covered by a backward reference, is re-inserted).
1440 fn insert(&mut self, argb: u32) {
1441 let idx = self.hash(argb);
1442 self.entries[idx] = argb;
1443 }
1444}
1445
1446/// Second-pass §5.2.3 cache-aware token rewrite.
1447///
1448/// Walks `tokens` in stream order, maintaining the cache exactly as
1449/// the decoder will. When a `Literal(argb)` matches the cache's
1450/// current slot for `argb`, the literal is rewritten to a
1451/// `CacheRef { index }` token so the decoder can re-read it from the
1452/// cache. Backward-reference copies pass through unchanged; the
1453/// covered pixels are inserted into the cache (spec §5.2.3) so later
1454/// repeats can refer back to them via cache codes.
1455///
1456/// `pixels` provides the underlying pixel sequence for backward
1457/// references (needed to know which colors a `Copy` token covers so
1458/// the cache state stays in sync).
1459fn cacheify_tokens(tokens: &[Token], pixels: &[u32], code_bits: u32) -> Vec<Token> {
1460 let mut cache = EncoderColorCache::new(code_bits);
1461 let mut out = Vec::with_capacity(tokens.len());
1462 let mut pos = 0usize;
1463 for &tok in tokens {
1464 match tok {
1465 Token::Literal(argb) => {
1466 if let Some(idx) = cache.contains(argb) {
1467 out.push(Token::CacheRef { index: idx as u32 });
1468 } else {
1469 out.push(Token::Literal(argb));
1470 }
1471 cache.insert(argb);
1472 pos += 1;
1473 }
1474 Token::CacheRef { .. } => {
1475 // Caller should not pre-emit cache refs into the
1476 // input stream; keep tokens we don't recognise as
1477 // literals from the matcher's output verbatim.
1478 out.push(tok);
1479 pos += 1;
1480 }
1481 Token::Copy { length, distance } => {
1482 out.push(tok);
1483 // Mirror the decoder's §5.2.3 invariant: every pixel
1484 // covered by a backward-reference copy is inserted in
1485 // stream order. The source pixels live at
1486 // `pos - distance .. pos - distance + length` in
1487 // `pixels`; the destination at `pos .. pos + length`
1488 // would be identical (copies always reproduce source
1489 // bytes), so we read directly off the source slice.
1490 let src_start = pos - distance;
1491 for i in 0..length {
1492 let argb = pixels[src_start + i];
1493 cache.insert(argb);
1494 }
1495 pos += length;
1496 }
1497 }
1498 }
1499 debug_assert_eq!(
1500 pos,
1501 pixels.len(),
1502 "cacheify_tokens: token stream covered {pos} of {} pixels",
1503 pixels.len()
1504 );
1505 out
1506}
1507
1508/// The five per-symbol frequency tables for one prefix-code group: green
1509/// (literals + §5.2.2 length symbols + §5.2.3 cache indices), red, blue,
1510/// alpha, and distance.
1511struct Frequencies {
1512 green: Vec<u32>,
1513 red: Vec<u32>,
1514 blue: Vec<u32>,
1515 alpha: Vec<u32>,
1516 distance: Vec<u32>,
1517}
1518
1519/// Legacy §5.2.2 *scan-line* distance encoding (`distance_code = D + 120`).
1520///
1521/// The decoder's [`crate::vp8l_decode::distance_code_to_pixel_distance`]
1522/// maps any `distance_code > 120` straight back to `distance_code - 120 == D`,
1523/// so this is always a valid round-trip. Retained as the unit-test reference
1524/// (so the round-130 chooser can be measured against the round-119 baseline)
1525/// — production paths use [`pixel_distance_to_distance_code`], which picks
1526/// the smaller of the scan-line code and any matching distance-map code.
1527#[cfg(test)]
1528fn distance_to_code(distance: usize) -> u32 {
1529 distance as u32 + crate::vp8l_decode::NUM_DISTANCE_MAP_CODES as u32
1530}
1531
1532/// §5.2.2 distance-code chooser: pick the smaller of the scan-line code
1533/// (`D + 120`) and any §5.2.2 distance-map code `c ∈ 1..=120` that
1534/// reconstructs `D` for the given `image_width`.
1535///
1536/// A distance-map entry `(xi, yi)` at index `c-1` reconstructs to
1537/// `max(xi + yi * image_width, 1)` per the decoder's
1538/// [`crate::vp8l_decode::distance_code_to_pixel_distance`]. The chooser
1539/// returns the **smallest** raw code that reconstructs to `distance` —
1540/// smaller raw codes feed [`value_to_prefix`] through low-prefix slots
1541/// (codes `1..=4` use 0 extra bits; code `5` uses 1 extra bit; …), which
1542/// then enter the distance prefix-code's Huffman tree with the highest
1543/// frequencies and the shortest emitted lengths.
1544///
1545/// # Smallest-code early-out
1546///
1547/// Map codes occupy `1..=120` and the scan-line fallback is
1548/// `distance + 120 >= 121`, so **any** matching map entry is strictly
1549/// smaller than the fallback. Because the entries are visited in
1550/// ascending code order (`idx + 1`), the *first* entry whose
1551/// reconstruction equals `distance` is, by construction, the smallest
1552/// valid code — no later entry (higher code) and not the fallback can
1553/// beat it. The scan therefore returns on the first match instead of
1554/// continuing through all 120 entries. When no entry matches it falls
1555/// through to the scan-line code. This preserves the exact same chosen
1556/// code as a full scan with a smallest-code tie-break, so the emitted
1557/// bytes are unchanged.
1558///
1559/// The reconstruction is identical to the legacy scan-line form, so the
1560/// decoder produces the exact same pixel distance and the round-trip
1561/// stays bit-exact.
1562///
1563/// Panics in debug builds when `distance == 0` (callers guarantee
1564/// `1 <= distance <= position` per §5.2.2's backward-reference invariant).
1565pub fn pixel_distance_to_distance_code(distance: usize, image_width: u32) -> u32 {
1566 debug_assert!(distance >= 1, "§5.2.2 distance must be >= 1");
1567 let width_i32 = image_width as i32;
1568 for (idx, &(xi, yi)) in crate::vp8l_decode::DISTANCE_MAP.iter().enumerate() {
1569 // The decoder computes `xi + yi * W` and clamps to 1. Match the
1570 // exact reconstruction so we never emit a code that would resolve
1571 // to a different distance.
1572 let raw = xi + yi * width_i32;
1573 let mapped = if raw < 1 { 1 } else { raw as usize };
1574 if mapped == distance {
1575 // First match is the smallest code (entries are in ascending
1576 // code order) and always < the scan-line fallback (>= 121),
1577 // so return immediately.
1578 return (idx + 1) as u32;
1579 }
1580 }
1581 distance as u32 + crate::vp8l_decode::NUM_DISTANCE_MAP_CODES as u32
1582}
1583
1584/// Accumulate the per-symbol frequencies for a token stream so the entropy
1585/// stage can build length-optimal prefix codes before emitting.
1586///
1587/// `color_cache_size` is `1 << color_cache_code_bits` (0 when the cache
1588/// is disabled). It extends the GREEN alphabet to
1589/// `256 + 24 + color_cache_size` per §6.2.3 so a `CacheRef { index }`
1590/// token's wire symbol `256 + 24 + index` is in range.
1591///
1592/// `image_width` is needed to feed [`pixel_distance_to_distance_code`] so
1593/// the frequency table matches the prefix codes the emit loop will choose
1594/// for each backward reference. Passing `1` (the legacy width-less form)
1595/// disables the §5.2.2 distance-map optimisation — only codes 1..=8 can
1596/// possibly match at width 1, so all row-style matches fall back to the
1597/// scan-line `D + 120` form.
1598fn count_frequencies(tokens: &[Token], color_cache_size: usize, image_width: u32) -> Frequencies {
1599 let green_alphabet = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + color_cache_size;
1600 let mut freqs = Frequencies {
1601 green: vec![0u32; green_alphabet],
1602 red: vec![0u32; 256],
1603 blue: vec![0u32; 256],
1604 alpha: vec![0u32; 256],
1605 distance: vec![0u32; 40],
1606 };
1607 for &tok in tokens {
1608 match tok {
1609 Token::Literal(p) => {
1610 let a = ((p >> 24) & 0xff) as usize;
1611 let r = ((p >> 16) & 0xff) as usize;
1612 let g = ((p >> 8) & 0xff) as usize;
1613 let b = (p & 0xff) as usize;
1614 freqs.green[g] += 1;
1615 freqs.red[r] += 1;
1616 freqs.blue[b] += 1;
1617 freqs.alpha[a] += 1;
1618 }
1619 Token::CacheRef { index } => {
1620 // §5.2.3: GREEN symbol is `256 + 24 + index`.
1621 let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + index as usize;
1622 debug_assert!(sym < green_alphabet);
1623 freqs.green[sym] += 1;
1624 }
1625 Token::Copy { length, distance } => {
1626 // §5.2.2: length is a GREEN symbol `256 + length_prefix`.
1627 let (len_prefix, _, _) = value_to_prefix(length as u32);
1628 freqs.green[256 + len_prefix as usize] += 1;
1629 // Distance prefix code (#5). Width-aware chooser picks the
1630 // smaller of scan-line `D + 120` and any §5.2.2 distance-map
1631 // code reconstructing to `D` for `image_width`.
1632 let raw_code = pixel_distance_to_distance_code(distance, image_width);
1633 let (dist_prefix, _, _) = value_to_prefix(raw_code);
1634 freqs.distance[dist_prefix as usize] += 1;
1635 }
1636 }
1637 }
1638 freqs
1639}
1640
1641/// Emit a length/distance `value` to `w`: the entropy-coded prefix symbol
1642/// via `code`, then its `extra_bits` raw bits LSB-first (matching the
1643/// decoder's `ReadBits`). `symbol_base` is added to the prefix code before
1644/// the entropy lookup (256 for GREEN length symbols, 0 for distances).
1645fn write_lz77_value(w: &mut BitWriter, code: &WriteCode, symbol_base: usize, value: u32) {
1646 let (prefix, extra_bits, extra_value) = value_to_prefix(value);
1647 code.write_symbol(w, symbol_base + prefix as usize);
1648 if extra_bits > 0 {
1649 w.write_bits(extra_value, extra_bits as usize);
1650 }
1651}
1652
1653/// §3.5.3 / §3.8.2 *forward* subtract-green transform: subtract the green
1654/// channel from red and blue per pixel, in place. The exact inverse of
1655/// [`crate::vp8l_transform::inverse_subtract_green`], so re-applying the
1656/// decoder's inverse pass after entropy decode restores the original
1657/// pixels byte-for-byte.
1658///
1659/// Spec arithmetic: `red := (red - green) & 0xff`,
1660/// `blue := (blue - green) & 0xff` (the §3.5.3 inverse is `+ green & 0xff`,
1661/// so subtracting on the encode side and adding back on the decode side is
1662/// a perfect round trip modulo 256).
1663pub fn apply_subtract_green(pixels: &mut [u32]) {
1664 for px in pixels.iter_mut() {
1665 let a = (*px >> 24) & 0xff;
1666 let r = (*px >> 16) & 0xff;
1667 let g = (*px >> 8) & 0xff;
1668 let b = *px & 0xff;
1669 let r_new = r.wrapping_sub(g) & 0xff;
1670 let b_new = b.wrapping_sub(g) & 0xff;
1671 *px = (a << 24) | (r_new << 16) | (g << 8) | b_new;
1672 }
1673}
1674
1675// ---- §4.1 spatial-predictor forward transform (encoder side) ----
1676
1677/// `DIV_ROUND_UP(num, den)` from §4.1 (`((num) + (den) - 1) / (den)`).
1678#[inline]
1679fn predictor_div_round_up(num: u32, den: u32) -> u32 {
1680 num.div_ceil(den)
1681}
1682
1683/// Per-channel `(a + b) / 2` (`Average2` from §4.1).
1684#[inline]
1685fn predictor_average2(a: u32, b: u32) -> u32 {
1686 let f = |sh: u32| -> u32 {
1687 let ca = (a >> sh) & 0xff;
1688 let cb = (b >> sh) & 0xff;
1689 (ca + cb) / 2
1690 };
1691 (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1692}
1693
1694/// `Clamp(a)` from §4.1: saturate `a` to `[0, 255]`.
1695#[inline]
1696fn predictor_clamp(a: i32) -> i32 {
1697 a.clamp(0, 255)
1698}
1699
1700/// §4.1 `ClampAddSubtractFull(a, b, c)` = `Clamp(a + b - c)` per channel.
1701#[inline]
1702fn predictor_clamp_add_subtract_full(a: u32, b: u32, c: u32) -> u32 {
1703 let f = |sh: u32| -> u32 {
1704 let ca = ((a >> sh) & 0xff) as i32;
1705 let cb = ((b >> sh) & 0xff) as i32;
1706 let cc = ((c >> sh) & 0xff) as i32;
1707 predictor_clamp(ca + cb - cc) as u32
1708 };
1709 (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1710}
1711
1712/// §4.1 `ClampAddSubtractHalf(a, b)` = `Clamp(a + (a - b) / 2)` per
1713/// channel.
1714#[inline]
1715fn predictor_clamp_add_subtract_half(a: u32, b: u32) -> u32 {
1716 let f = |sh: u32| -> u32 {
1717 let ca = ((a >> sh) & 0xff) as i32;
1718 let cb = ((b >> sh) & 0xff) as i32;
1719 predictor_clamp(ca + (ca - cb) / 2) as u32
1720 };
1721 (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1722}
1723
1724/// §4.1 `Select(L, T, TL)` — whichever of `L` / `T` is closer
1725/// (per-channel Manhattan distance) to the `L + T - TL` estimate.
1726#[inline]
1727fn predictor_select(l: u32, t: u32, tl: u32) -> u32 {
1728 let ach = |x: u32| ((x >> 24) & 0xff) as i32;
1729 let rch = |x: u32| ((x >> 16) & 0xff) as i32;
1730 let gch = |x: u32| ((x >> 8) & 0xff) as i32;
1731 let bch = |x: u32| (x & 0xff) as i32;
1732
1733 let p_a = ach(l) + ach(t) - ach(tl);
1734 let p_r = rch(l) + rch(t) - rch(tl);
1735 let p_g = gch(l) + gch(t) - gch(tl);
1736 let p_b = bch(l) + bch(t) - bch(tl);
1737
1738 let p_l =
1739 (p_a - ach(l)).abs() + (p_r - rch(l)).abs() + (p_g - gch(l)).abs() + (p_b - bch(l)).abs();
1740 let p_t =
1741 (p_a - ach(t)).abs() + (p_r - rch(t)).abs() + (p_g - gch(t)).abs() + (p_b - bch(t)).abs();
1742
1743 if p_l < p_t {
1744 l
1745 } else {
1746 t
1747 }
1748}
1749
1750/// Compute the §4.1 prediction for `mode ∈ 0..=13` given the four
1751/// reconstructed-pixel neighbours.
1752///
1753/// Identical formula to the decoder's
1754/// `crate::vp8l_transform::inverse_predictor` `predict` helper — kept
1755/// as a separate copy here because the encoder is built (and tested)
1756/// independently of the decoder's transform module.
1757fn predictor_predict(mode: u8, l: u32, t: u32, tr: u32, tl: u32) -> u32 {
1758 match mode {
1759 0 => 0xff00_0000,
1760 1 => l,
1761 2 => t,
1762 3 => tr,
1763 4 => tl,
1764 5 => predictor_average2(predictor_average2(l, tr), t),
1765 6 => predictor_average2(l, tl),
1766 7 => predictor_average2(l, t),
1767 8 => predictor_average2(tl, t),
1768 9 => predictor_average2(t, tr),
1769 10 => predictor_average2(predictor_average2(l, tl), predictor_average2(t, tr)),
1770 11 => predictor_select(l, t, tl),
1771 12 => predictor_clamp_add_subtract_full(l, t, tl),
1772 13 => predictor_clamp_add_subtract_half(predictor_average2(l, t), tl),
1773 // §4.1 only defines [0..13]. An out-of-range mode produces the
1774 // top-left's solid-black prediction, matching the decoder.
1775 _ => 0xff00_0000,
1776 }
1777}
1778
1779/// Per-channel residual `(original - pred) mod 256`. The inverse of
1780/// the decoder's `add_pred` (`residual + pred mod 256 = original`),
1781/// so re-applying the §4.1 inverse predictor recovers `original`
1782/// exactly.
1783///
1784/// **Round-224 SWAR experiment — closure-of-four body retained.**
1785/// The decoder-side `add_pred` was rewritten in round 170 as a
1786/// two-pair SWAR (`(x & 0x00ff_00ff).wrapping_add(...)` /
1787/// `(x & 0xff00_ff00).wrapping_add(...)`) because addition does not
1788/// propagate carry across the zero "guard" bytes when the summand has
1789/// its high bit masked out. Subtraction is asymmetric: a borrow at the
1790/// low byte of a lane DOES propagate through the zero guard byte and
1791/// corrupts the adjacent lane, so the mirror rewrite needs to bias
1792/// the minuend with a `0x0100` guard per lane (`(orig & 0x00ff_00ff)
1793/// | 0x0100_0100`) to suppress underflow before the subtract, with a
1794/// final `& 0x00ff_00ff` mask to clear the guard. We measured both
1795/// forms in round 224 against the new `predictor_subtract_256x256`
1796/// bench: **34.1 µs (closure-of-four) → 40.5 µs (biased SWAR), a
1797/// +18.4% regression.** AArch64 NEON auto-vectorisation of the four
1798/// sequential per-byte `wrapping_sub` calls is tighter than the
1799/// explicit biased-SWAR pattern at this call site. Same shape as the
1800/// round-194 BENCHMARKS footnote that recorded a regression for a
1801/// `clamp_add_subtract_*` (mode 12) per-channel `to_le_bytes()` +
1802/// `i16` byte-loop attempt — the closure-of-four `i32` body remains
1803/// the right starting point on this target until a true 16-byte
1804/// `std::simd` formulation can amortise the lane-bias cost across
1805/// multiple pixels per iteration (mirroring the `to_rgba_simd`
1806/// precedent under the `simd` feature).
1807#[inline]
1808pub fn predictor_subtract(original: u32, pred: u32) -> u32 {
1809 let a = ((original >> 24) & 0xff).wrapping_sub((pred >> 24) & 0xff) & 0xff;
1810 let r = ((original >> 16) & 0xff).wrapping_sub((pred >> 16) & 0xff) & 0xff;
1811 let g = ((original >> 8) & 0xff).wrapping_sub((pred >> 8) & 0xff) & 0xff;
1812 let b = (original & 0xff).wrapping_sub(pred & 0xff) & 0xff;
1813 (a << 24) | (r << 16) | (g << 8) | b
1814}
1815
1816/// Cost proxy used to pick a block's predictor mode: the sum of
1817/// per-pixel per-channel `|residual|` over the block, where `|x|`
1818/// folds the mod-256 residual onto `[-128, 127]` (a value `x ∈ [0,
1819/// 255]` representing `(original - pred) mod 256` has true magnitude
1820/// `min(x, 256 - x)`).
1821///
1822/// Sum-of-magnitudes is a standard zero-cost proxy for the entropy
1823/// of the residual histogram: lower magnitudes peak the histogram
1824/// near zero, which a Huffman code over the residual symbols
1825/// compresses well. Using the folded magnitude correctly rewards
1826/// modes that produce both small-positive and small-negative
1827/// residuals (e.g. `0xff` = `-1 mod 256`, magnitude 1).
1828#[inline]
1829fn residual_magnitude(residual: u32) -> u32 {
1830 let fold = |v: u32| -> u32 {
1831 let v = v & 0xff;
1832 if v <= 128 {
1833 v
1834 } else {
1835 256 - v
1836 }
1837 };
1838 fold(residual >> 24) + fold(residual >> 16) + fold(residual >> 8) + fold(residual)
1839}
1840
1841/// §4.1 border-aware prediction at `(x, y)`. Mirrors
1842/// [`crate::vp8l_transform::inverse_predictor`]: top-left is solid
1843/// black `0xff000000`; top row predicts L; left column predicts T;
1844/// rightmost column uses the row's leftmost pixel as TR; otherwise
1845/// `predictor_predict(mode, L, T, TR, TL)`.
1846///
1847/// `pixels` is the `width × height` ARGB source (read-only — the
1848/// encoder predicts against the *originals*, since the decoder
1849/// reconstructs pixels equal to those originals).
1850fn predictor_at(pixels: &[u32], width: usize, x: usize, y: usize, mode: u8) -> u32 {
1851 if x == 0 && y == 0 {
1852 return 0xff00_0000;
1853 }
1854 let idx = y * width + x;
1855 if y == 0 {
1856 return pixels[idx - 1];
1857 }
1858 if x == 0 {
1859 return pixels[idx - width];
1860 }
1861 let l = pixels[idx - 1];
1862 let t = pixels[idx - width];
1863 let tl = pixels[idx - width - 1];
1864 let tr = if x == width - 1 {
1865 pixels[idx - width - (width - 1)]
1866 } else {
1867 pixels[idx - width + 1]
1868 };
1869 predictor_predict(mode, l, t, tr, tl)
1870}
1871
1872/// Per-pixel residual consumer for [`for_each_block_residual`].
1873///
1874/// `pixel` receives each in-bounds block pixel's §4.1 mod-256
1875/// residual in raster order; `row_end` runs after the last pixel of
1876/// each block row and returns whether the walk should continue.
1877///
1878/// Pruning at row granularity (instead of per pixel, as the
1879/// pre-round-280 chooser loops did) is pick-identical: a pruned
1880/// walk's partial cost is only ever compared `>= cap` by the caller,
1881/// and per-pixel contributions are non-negative, so any partial sum
1882/// that prunes implies the full sum would also have compared
1883/// `>= cap`. Coarsening the prune lets the interior pixel loop run
1884/// branch-free (auto-vectorisable) at the cost of at most one block
1885/// row of extra work on a pruned mode.
1886trait ResidualSink {
1887 fn pixel(&mut self, residual: u32);
1888 fn row_end(&mut self) -> bool;
1889}
1890
1891/// Walk every in-bounds pixel of the block `[x0, x0+bw) × [y0,
1892/// y0+bh)` of the `width × height` image in raster order, feeding
1893/// each pixel's §4.1 residual (`predictor_subtract(original,
1894/// prediction)`) to `sink`. Interior predictions come from
1895/// `predict(l, t, tr, tl)`; border pixels follow the §4.1 border
1896/// rules (top-left → solid black, top row → L, left column → T,
1897/// rightmost column → the §4.1 TR wraparound) — pixel-for-pixel
1898/// identical to a [`predictor_at`] + [`predictor_subtract`] walk,
1899/// but with the border branch chain hoisted out of the inner loop
1900/// (round-180 decoder precedent) and the predictor monomorphised in
1901/// by the caller so the per-pixel 14-way mode dispatch disappears.
1902#[inline]
1903#[allow(clippy::too_many_arguments)]
1904fn walk_block_residuals<P, S>(
1905 pixels: &[u32],
1906 width: usize,
1907 height: usize,
1908 x0: usize,
1909 y0: usize,
1910 bw: usize,
1911 bh: usize,
1912 predict: P,
1913 sink: &mut S,
1914) where
1915 P: Fn(u32, u32, u32, u32) -> u32,
1916 S: ResidualSink,
1917{
1918 let y_end = (y0 + bh).min(height);
1919 let x_end = (x0 + bw).min(width);
1920 if x0 >= x_end || y0 >= y_end {
1921 return;
1922 }
1923 let mut y = y0;
1924 if y == 0 {
1925 // Top row: (0, 0) predicts solid black, the rest predict L.
1926 let mut x = x0;
1927 if x == 0 {
1928 sink.pixel(predictor_subtract(pixels[0], 0xff00_0000));
1929 x = 1;
1930 }
1931 while x < x_end {
1932 sink.pixel(predictor_subtract(pixels[x], pixels[x - 1]));
1933 x += 1;
1934 }
1935 if !sink.row_end() {
1936 return;
1937 }
1938 y = 1;
1939 }
1940 // The §4.1 right-column TR wraparound only applies when the block
1941 // reaches the image's right edge.
1942 let interior_end = if x_end == width { width - 1 } else { x_end };
1943 while y < y_end {
1944 let row = y * width;
1945 let mut x = x0;
1946 if x == 0 {
1947 // Left column predicts T.
1948 sink.pixel(predictor_subtract(pixels[row], pixels[row - width]));
1949 x = 1;
1950 }
1951 while x < interior_end {
1952 let idx = row + x;
1953 let l = pixels[idx - 1];
1954 let t = pixels[idx - width];
1955 let tl = pixels[idx - width - 1];
1956 let tr = pixels[idx - width + 1];
1957 sink.pixel(predictor_subtract(pixels[idx], predict(l, t, tr, tl)));
1958 x += 1;
1959 }
1960 if x < x_end {
1961 // x == width - 1: §4.1 TR wraparound.
1962 let idx = row + x;
1963 let l = pixels[idx - 1];
1964 let t = pixels[idx - width];
1965 let tl = pixels[idx - width - 1];
1966 let tr = pixels[idx - width - (width - 1)];
1967 sink.pixel(predictor_subtract(pixels[idx], predict(l, t, tr, tl)));
1968 }
1969 if !sink.row_end() {
1970 return;
1971 }
1972 y += 1;
1973 }
1974}
1975
1976/// Run [`walk_block_residuals`] with the §4.1 predictor for `mode`
1977/// monomorphised into the walk, so the mode dispatch runs once per
1978/// block instead of once per pixel. Out-of-range modes predict solid
1979/// black, matching [`predictor_predict`].
1980#[inline]
1981#[allow(clippy::too_many_arguments)]
1982fn for_each_block_residual<S: ResidualSink>(
1983 pixels: &[u32],
1984 width: usize,
1985 height: usize,
1986 x0: usize,
1987 y0: usize,
1988 bw: usize,
1989 bh: usize,
1990 mode: u8,
1991 sink: &mut S,
1992) {
1993 macro_rules! walk {
1994 ($p:expr) => {
1995 walk_block_residuals(pixels, width, height, x0, y0, bw, bh, $p, sink)
1996 };
1997 }
1998 match mode {
1999 1 => walk!(|l, _, _, _| l),
2000 2 => walk!(|_, t, _, _| t),
2001 3 => walk!(|_, _, tr, _| tr),
2002 4 => walk!(|_, _, _, tl| tl),
2003 5 => walk!(|l, t, tr, _| predictor_average2(predictor_average2(l, tr), t)),
2004 6 => walk!(|l, _, _, tl| predictor_average2(l, tl)),
2005 7 => walk!(|l, t, _, _| predictor_average2(l, t)),
2006 8 => walk!(|_, t, _, tl| predictor_average2(tl, t)),
2007 9 => walk!(|_, t, tr, _| predictor_average2(t, tr)),
2008 10 => walk!(|l, t, tr, tl| predictor_average2(
2009 predictor_average2(l, tl),
2010 predictor_average2(t, tr)
2011 )),
2012 11 => walk!(|l, t, _, tl| predictor_select(l, t, tl)),
2013 12 => walk!(|l, t, _, tl| predictor_clamp_add_subtract_full(l, t, tl)),
2014 13 => walk!(|l, t, _, tl| predictor_clamp_add_subtract_half(predictor_average2(l, t), tl)),
2015 // Mode 0 and §4.1-undefined modes both predict solid black.
2016 _ => walk!(|_, _, _, _| 0xff00_0000),
2017 }
2018}
2019
2020/// [`ResidualSink`] accumulating the folded-L1 [`residual_magnitude`]
2021/// cost proxy, pruning at row granularity once the running sum
2022/// reaches `cap` (see the trait docs for why row-granular pruning is
2023/// pick-identical to the pre-round-280 per-pixel early-out).
2024struct MagnitudeCostSink {
2025 cost: u64,
2026 cap: u64,
2027}
2028
2029impl ResidualSink for MagnitudeCostSink {
2030 #[inline]
2031 fn pixel(&mut self, residual: u32) {
2032 self.cost += residual_magnitude(residual) as u64;
2033 }
2034 #[inline]
2035 fn row_end(&mut self) -> bool {
2036 self.cost < self.cap
2037 }
2038}
2039
2040/// [`ResidualSink`] filling the per-channel residual byte histograms
2041/// [`block_mode_entropy_cost`] feeds its Shannon sum. Never prunes:
2042/// the histograms must be complete before the entropy is meaningful.
2043struct ResidualHistogramSink {
2044 hist: [[u32; 256]; 4],
2045 n: u32,
2046}
2047
2048impl ResidualSink for ResidualHistogramSink {
2049 #[inline]
2050 fn pixel(&mut self, residual: u32) {
2051 self.hist[0][((residual >> 24) & 0xff) as usize] += 1;
2052 self.hist[1][((residual >> 16) & 0xff) as usize] += 1;
2053 self.hist[2][((residual >> 8) & 0xff) as usize] += 1;
2054 self.hist[3][(residual & 0xff) as usize] += 1;
2055 self.n += 1;
2056 }
2057 #[inline]
2058 fn row_end(&mut self) -> bool {
2059 true
2060 }
2061}
2062
2063/// Pick the §4.1 mode `0..=13` that minimises the residual cost
2064/// proxy over the rectangular block `[x0, x0+bw) × [y0, y0+bh)` of
2065/// the `width × height` image. Border rules per
2066/// [`predictor_at`].
2067///
2068/// On ties (multiple modes producing equal magnitude sums) the
2069/// lowest mode wins, which makes the chooser deterministic.
2070///
2071/// This is the no-hint entry point — equivalent to calling
2072/// [`pick_block_mode_with_hint`] with `prefer_mode = None`. The
2073/// production caller [`build_predictor_image`] uses the
2074/// hint-aware variant; the no-hint form is retained for the
2075/// in-module tie-breaker tests.
2076#[cfg(test)]
2077fn pick_block_mode(
2078 pixels: &[u32],
2079 width: usize,
2080 height: usize,
2081 x0: usize,
2082 y0: usize,
2083 bw: usize,
2084 bh: usize,
2085) -> u8 {
2086 pick_block_mode_with_hint(pixels, width, height, x0, y0, bw, bh, None)
2087}
2088
2089/// Compute the §4.1 residual-cost proxy for a single mode over
2090/// the rectangular block `[x0, x0+bw) × [y0, y0+bh)`. Walks every
2091/// in-bounds pixel without an early-out so the caller can use the
2092/// result as an authoritative tie-break reference.
2093///
2094/// This is the same per-mode sum the main chooser computes inside
2095/// [`pick_block_mode_with_hint`], factored out so the entropy-
2096/// image-aware tie-breaker can evaluate the preferred neighbour
2097/// mode exactly once and re-use the value to decide whether a
2098/// post-walk swap is allowed.
2099#[allow(clippy::too_many_arguments)]
2100fn block_mode_cost(
2101 pixels: &[u32],
2102 width: usize,
2103 height: usize,
2104 x0: usize,
2105 y0: usize,
2106 bw: usize,
2107 bh: usize,
2108 mode: u8,
2109) -> u64 {
2110 block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, u64::MAX)
2111}
2112
2113/// [`block_mode_cost`] with a pruning `cap`: once the running cost
2114/// reaches `cap` at a block-row boundary the walk stops and the
2115/// partial sum is returned. Callers only compare a pruned return
2116/// value `>= cap` (the residual magnitudes are non-negative, so a
2117/// partial sum at or above `cap` proves the full sum is too), which
2118/// keeps mode picks identical to an uncapped walk.
2119#[allow(clippy::too_many_arguments)]
2120fn block_mode_cost_capped(
2121 pixels: &[u32],
2122 width: usize,
2123 height: usize,
2124 x0: usize,
2125 y0: usize,
2126 bw: usize,
2127 bh: usize,
2128 mode: u8,
2129 cap: u64,
2130) -> u64 {
2131 let mut sink = MagnitudeCostSink { cost: 0, cap };
2132 for_each_block_residual(pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
2133 sink.cost
2134}
2135
2136/// Hint-aware variant of [`pick_block_mode`]: picks the §4.1 mode
2137/// minimising the residual cost proxy, and on ties prefers
2138/// `prefer_mode` over the otherwise-lowest tied mode.
2139///
2140/// `prefer_mode = Some(m)` directs the tie-break: when `m`'s cost
2141/// equals the lowest cost found across all 14 modes, the chooser
2142/// returns `m` instead of the lowest-indexed tied mode. When
2143/// `prefer_mode = None` (or `prefer_mode = Some(m)` with `m`
2144/// strictly worse than another mode), the lowest-tied-mode behaviour
2145/// is preserved exactly.
2146///
2147/// Round 159: [`build_predictor_image`] passes the left neighbour
2148/// block's chosen mode (or the top neighbour at the left edge of
2149/// the predictor image) as the hint. The §3.5 RFC 9649 note
2150/// "transform data can be decided based on entropy minimization"
2151/// motivates this: residual-cost-equal modes encode different
2152/// values into the predictor sub-image, and the sub-image is
2153/// written as an `entropy-coded-image` (§7.2) so reducing its
2154/// symbol entropy directly shrinks the output stream. The
2155/// residuals themselves do not change (this is a strict tie-break),
2156/// so decode round-trips are unaffected.
2157#[allow(clippy::too_many_arguments)]
2158fn pick_block_mode_with_hint(
2159 pixels: &[u32],
2160 width: usize,
2161 height: usize,
2162 x0: usize,
2163 y0: usize,
2164 bw: usize,
2165 bh: usize,
2166 prefer_mode: Option<u8>,
2167) -> u8 {
2168 let mut best_mode: u8 = 0;
2169 let mut best_cost = u64::MAX;
2170 for mode in 0u8..=13 {
2171 // The cap prunes modes already worse than the current best at
2172 // block-row granularity; a pruned partial sum is `>= best_cost`
2173 // so the `cost < best_cost` update below stays pick-identical
2174 // to a full walk.
2175 let cost = block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, best_cost);
2176 if cost < best_cost {
2177 best_cost = cost;
2178 best_mode = mode;
2179 }
2180 }
2181 // Round 159 entropy-image-aware tie-breaker. If the caller
2182 // supplied a preferred mode (typically the left or top neighbour
2183 // block's chosen mode) and the preferred mode's full cost ties
2184 // with `best_cost`, swap to the preferred mode so the predictor
2185 // sub-image carries a longer run of identical mode values. The
2186 // residual stream produced by the main image's forward transform
2187 // is unchanged (the cost is equal), so decode round-trips are
2188 // bit-identical.
2189 if let Some(m) = prefer_mode {
2190 if m != best_mode {
2191 let cost = block_mode_cost(pixels, width, height, x0, y0, bw, bh, m);
2192 if cost == best_cost {
2193 best_mode = m;
2194 }
2195 }
2196 }
2197 best_mode
2198}
2199
2200/// Round 160 *slack-cost* variant of [`pick_block_mode_with_hint`].
2201///
2202/// Where the round-159 strict tie-break only swaps to the preferred
2203/// mode when its residual cost is **exactly equal** to the best,
2204/// this variant also accepts the preferred mode when its cost is
2205/// within an additive `slack` budget of the best. RFC 9649 §3.5
2206/// authorises the encoder to "decide \[transform data\] based on
2207/// entropy minimization", and the slack budget formalises the
2208/// trade-off: a small per-pixel-magnitude increase in the §4.1
2209/// residual stream may be acceptable when it strictly reduces the
2210/// entropy of the §7.2 predictor sub-image (longer run of identical
2211/// mode values → fewer distinct prefix-code symbols → fewer bytes
2212/// emitted for the sub-image).
2213///
2214/// This is no longer a residual-cost-neutral swap: the residuals
2215/// produced by the main image's forward transform **do change** on
2216/// a slack-accepted swap. Decode round-trips are still bit-correct
2217/// (the residuals are recomputed against the chosen mode at
2218/// `apply_forward_predictor` time, and the decoder applies the same
2219/// mode in reverse), but pixel-level decode equivalence between two
2220/// encoder runs at different slack budgets is **not** preserved —
2221/// only end-to-end image round-trip equivalence is.
2222///
2223/// The encoder protects itself from regressions by building both the
2224/// `slack = 0` (strict, round-159 baseline) and `slack > 0`
2225/// predictor candidates and keeping the strictly-smaller encoded
2226/// stream — so a slack candidate that hurts overall byte cost on
2227/// some input is simply not chosen.
2228#[allow(clippy::too_many_arguments)]
2229fn pick_block_mode_with_hint_slack(
2230 pixels: &[u32],
2231 width: usize,
2232 height: usize,
2233 x0: usize,
2234 y0: usize,
2235 bw: usize,
2236 bh: usize,
2237 prefer_mode: Option<u8>,
2238 slack: u64,
2239) -> u8 {
2240 let mut best_mode: u8 = 0;
2241 let mut best_cost = u64::MAX;
2242 for mode in 0u8..=13 {
2243 // Row-granular prune against the current best; pick-identical
2244 // to a full walk (see `block_mode_cost_capped`).
2245 let cost = block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, best_cost);
2246 if cost < best_cost {
2247 best_cost = cost;
2248 best_mode = mode;
2249 }
2250 }
2251 // Round-160 slack-cost tie-break: accept the preferred neighbour
2252 // mode when its cost is within `slack` of the best cost. The
2253 // slack budget lets the encoder trade a small residual increase
2254 // for a predictor-sub-image entropy drop. `slack == 0` recovers
2255 // the round-159 strict tie-break behaviour exactly.
2256 if let Some(m) = prefer_mode {
2257 if m != best_mode {
2258 let cost = block_mode_cost(pixels, width, height, x0, y0, bw, bh, m);
2259 if cost <= best_cost.saturating_add(slack) {
2260 best_mode = m;
2261 }
2262 }
2263 }
2264 best_mode
2265}
2266
2267/// Build the §4.1 sub-resolution *predictor image*: one ARGB pixel
2268/// per `(1 << size_bits)`-pixel-square block of the main image, with
2269/// the chosen mode stored in the green channel (alpha/red/blue
2270/// fixed at 0xff / 0 / 0 — the decoder only reads the green channel
2271/// via `inverse_predictor`'s `green(predictor_image[...])`).
2272///
2273/// Returns `(predictor_image, transform_width, transform_height)`.
2274/// `transform_width = DIV_ROUND_UP(width, 1 << size_bits)` and
2275/// `transform_height = DIV_ROUND_UP(height, 1 << size_bits)`, per
2276/// §4.1.
2277///
2278/// Round 159: each block consults
2279/// [`pick_block_mode_with_hint`] with the immediately-prior
2280/// block's chosen mode as the preferred tie-break — left neighbour
2281/// in the current row, or the top neighbour for blocks in the left
2282/// column (no neighbour for the top-left block). This is a strict
2283/// tie-break: when the preferred mode's residual cost equals the
2284/// otherwise-lowest cost, the neighbour's value is chosen so the
2285/// predictor sub-image carries longer runs of identical modes,
2286/// dropping the sub-image's entropy and the bytes the
2287/// `entropy-coded-image` writer emits for it. Residual values are
2288/// unchanged on cost-equal swaps, so decoded pixels are
2289/// bit-identical to the round-158 baseline.
2290fn build_predictor_image(
2291 pixels: &[u32],
2292 width: u32,
2293 height: u32,
2294 size_bits: u8,
2295) -> (Vec<u32>, u32, u32) {
2296 let block = 1u32 << size_bits;
2297 let tw = predictor_div_round_up(width, block);
2298 let th = predictor_div_round_up(height, block);
2299 let mut img = Vec::with_capacity((tw * th) as usize);
2300 let w = width as usize;
2301 let h = height as usize;
2302 let bsz = block as usize;
2303 // Track the previous row's chosen modes so the left-column
2304 // blocks can fall back to a top neighbour. Each slot is `None`
2305 // while building the very first row.
2306 let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2307 for by in 0..th as usize {
2308 let mut left_mode: Option<u8> = None;
2309 for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2310 let x0 = bx * bsz;
2311 let y0 = by * bsz;
2312 // Preferred tie-break: left neighbour (current row) if
2313 // present, else top neighbour (previous row). The
2314 // top-left block (by == 0 && bx == 0) gets no hint and
2315 // falls back to the lowest-tied-mode default.
2316 let prefer = left_mode.or(*top_slot);
2317 let mode = pick_block_mode_with_hint(pixels, w, h, x0, y0, bsz, bsz, prefer);
2318 // Pack mode into the green channel; opaque alpha and
2319 // zeroed red/blue keep the sub-image visually inert and
2320 // match the channel the decoder reads.
2321 img.push(0xff00_0000 | ((mode as u32) << 8));
2322 left_mode = Some(mode);
2323 *top_slot = Some(mode);
2324 }
2325 }
2326 (img, tw, th)
2327}
2328
2329/// Round-160 *slack-cost* variant of [`build_predictor_image`].
2330///
2331/// Identical structure to `build_predictor_image`, but routes every
2332/// per-block mode choice through [`pick_block_mode_with_hint_slack`]
2333/// with the caller-supplied `slack` budget. `slack == 0` recovers
2334/// `build_predictor_image` exactly. Larger `slack` values let the
2335/// preferred neighbour mode win even at a small residual-cost
2336/// increase, trading per-pixel residual mass against the §7.2
2337/// predictor-sub-image's symbol entropy.
2338///
2339/// Round-trip correctness is unaffected by `slack`: the forward
2340/// transform later re-derives residuals against the chosen modes,
2341/// and the decoder's inverse pass uses the same modes from the
2342/// sub-image, so the decoded image always equals the input.
2343///
2344/// The encoder chooser builds both `slack == 0` and `slack > 0`
2345/// candidates and keeps the shortest, so a slack candidate that
2346/// hurts overall byte cost on a given input is simply not chosen.
2347fn build_predictor_image_with_slack(
2348 pixels: &[u32],
2349 width: u32,
2350 height: u32,
2351 size_bits: u8,
2352 slack: u64,
2353) -> (Vec<u32>, u32, u32) {
2354 let block = 1u32 << size_bits;
2355 let tw = predictor_div_round_up(width, block);
2356 let th = predictor_div_round_up(height, block);
2357 let mut img = Vec::with_capacity((tw * th) as usize);
2358 let w = width as usize;
2359 let h = height as usize;
2360 let bsz = block as usize;
2361 let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2362 for by in 0..th as usize {
2363 let mut left_mode: Option<u8> = None;
2364 for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2365 let x0 = bx * bsz;
2366 let y0 = by * bsz;
2367 let prefer = left_mode.or(*top_slot);
2368 let mode =
2369 pick_block_mode_with_hint_slack(pixels, w, h, x0, y0, bsz, bsz, prefer, slack);
2370 img.push(0xff00_0000 | ((mode as u32) << 8));
2371 left_mode = Some(mode);
2372 *top_slot = Some(mode);
2373 }
2374 }
2375 (img, tw, th)
2376}
2377
2378/// Round 161 *Shannon-entropy bit-cost* per-mode cost function.
2379///
2380/// Where [`block_mode_cost`] sums the folded L1 magnitude of the
2381/// per-pixel residual as a *proxy* for Huffman bit cost, this
2382/// function computes the actual lower-bound bit cost a Huffman code
2383/// over the residual byte distribution would emit:
2384///
2385/// 1. Build the per-channel `[u32; 256]` histogram of the block's
2386/// mod-256 residuals against the candidate `mode`.
2387/// 2. Compute the Shannon entropy `H = -Σ (c/N) · log2(c/N)` over
2388/// each channel's histogram (zero-count bins contribute zero).
2389/// 3. Sum `N · H` across channels — this is the lower-bound bit
2390/// count a per-symbol Huffman code over those residuals would
2391/// emit (the encoder's actual prefix coder is within ~1 bit of
2392/// this bound per symbol, so the bit-count *ordering* between
2393/// modes is faithful even though absolute counts differ by O(1)
2394/// per symbol).
2395///
2396/// The cost is returned as a fixed-point u64 in units of
2397/// **milli-bits** (1 bit = 1000 units) so comparisons stay exact
2398/// without floats leaking into the chooser's tie-break logic. The
2399/// quantisation rounds to the nearest milli-bit which is finer
2400/// than any Huffman code's per-symbol cost, so two modes that
2401/// would tie in floating-point also tie in the quantised cost.
2402///
2403/// Walks every in-bounds pixel without an early-out (unlike
2404/// [`block_mode_cost`]'s magnitude proxy which can prune): the
2405/// per-channel histograms must be complete before the entropy
2406/// sum is meaningful.
2407#[allow(clippy::too_many_arguments)]
2408fn block_mode_entropy_cost(
2409 pixels: &[u32],
2410 width: usize,
2411 height: usize,
2412 x0: usize,
2413 y0: usize,
2414 bw: usize,
2415 bh: usize,
2416 mode: u8,
2417) -> u64 {
2418 let mut sink = ResidualHistogramSink {
2419 hist: [[0u32; 256]; 4],
2420 n: 0,
2421 };
2422 for_each_block_residual(pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
2423 let hist = sink.hist;
2424 let n = sink.n;
2425 if n == 0 {
2426 return 0;
2427 }
2428 // Σ_channels Σ_b c·log2(N/c) milli-bits, with c·log2(N/c) =
2429 // c·(log2(N) − log2(c)). Float arithmetic is fine here: the
2430 // result is rounded to nearest milli-bit before u64 cast, so
2431 // bit-for-bit determinism holds across platforms with IEEE-754
2432 // ln(). The Shannon expansion picks `log2(N/c)` rather than
2433 // `−log2(c/N)` to keep the per-bin operand non-negative (zero
2434 // when c = N, growing as c shrinks) which is friendly to the
2435 // accumulator.
2436 let n_f = n as f64;
2437 let log2_n = n_f.log2();
2438 let mut milli_bits: f64 = 0.0;
2439 for channel_hist in &hist {
2440 for &count in channel_hist.iter() {
2441 if count == 0 {
2442 continue;
2443 }
2444 let c_f = count as f64;
2445 // Per-bin contribution to N·H: c·log2(N/c).
2446 milli_bits += c_f * (log2_n - c_f.log2());
2447 }
2448 }
2449 // Scale to milli-bits and round to nearest.
2450 (milli_bits * 1000.0 + 0.5) as u64
2451}
2452
2453/// Round 161 *Shannon-entropy bit-cost* variant of
2454/// [`pick_block_mode_with_hint`].
2455///
2456/// Picks the §4.1 mode minimising [`block_mode_entropy_cost`] — a
2457/// true Huffman lower-bound bit cost rather than the L1 magnitude
2458/// proxy the round-159/160 chooser uses. The entropy bit-cost
2459/// correctly distinguishes a "near-zero with two outliers"
2460/// residual distribution (low L1, but the outliers force long
2461/// Huffman codes for the two distinct outlier values) from a
2462/// "spread of small values" distribution (slightly higher L1, but
2463/// more concentrated histogram → lower Huffman cost). The L1
2464/// proxy treats them as comparable; the entropy cost reflects
2465/// what the §5.x prefix-code writer will actually emit.
2466///
2467/// The hint mechanism mirrors [`pick_block_mode_with_hint`]: when
2468/// `prefer_mode = Some(m)` and `m`'s entropy cost equals the
2469/// chooser's best, the chooser returns `m` so the predictor sub-
2470/// image carries longer runs of identical mode values (§7.2
2471/// `entropy-coded-image` shrinks).
2472///
2473/// This is a strict tie-break: residual values are unchanged on
2474/// cost-equal swaps, so decode round-trips are bit-identical
2475/// across `prefer_mode` choices. End-to-end the encoder builds
2476/// both the L1-proxy and entropy-cost candidates and keeps the
2477/// shortest stream, so the entropy candidate cannot regress
2478/// against the L1 path — see [`encode_argb_with_predictor_chooser`].
2479#[allow(clippy::too_many_arguments)]
2480fn pick_block_mode_with_hint_entropy(
2481 pixels: &[u32],
2482 width: usize,
2483 height: usize,
2484 x0: usize,
2485 y0: usize,
2486 bw: usize,
2487 bh: usize,
2488 prefer_mode: Option<u8>,
2489) -> u8 {
2490 let mut best_mode: u8 = 0;
2491 let mut best_cost = u64::MAX;
2492 for mode in 0u8..=13 {
2493 let cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, mode);
2494 if cost < best_cost {
2495 best_cost = cost;
2496 best_mode = mode;
2497 }
2498 }
2499 // Round-159-style strict tie-break under the entropy cost.
2500 if let Some(m) = prefer_mode {
2501 if m != best_mode {
2502 let cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, m);
2503 if cost == best_cost {
2504 best_mode = m;
2505 }
2506 }
2507 }
2508 best_mode
2509}
2510
2511/// Round 161 *Shannon-entropy bit-cost* variant of
2512/// [`build_predictor_image`].
2513///
2514/// Identical structure to `build_predictor_image`, but routes every
2515/// per-block mode choice through [`pick_block_mode_with_hint_entropy`]
2516/// — replacing the round-159 L1-magnitude proxy with a true Huffman
2517/// lower-bound bit cost. The strict-tie-break hint mechanism is
2518/// preserved: the left neighbour (or top neighbour at the left
2519/// edge) is the preferred mode on cost-equal swaps.
2520///
2521/// Round-trip correctness is unaffected by the cost model choice:
2522/// the forward transform later re-derives residuals against the
2523/// chosen modes, and the decoder's inverse pass uses the same modes
2524/// from the sub-image, so the decoded image always equals the input.
2525///
2526/// The encoder chooser keeps both the L1-proxy candidates (round-
2527/// 159/160) and the entropy candidate and emits the shortest
2528/// stream, so a fixture on which the L1 proxy is genuinely better
2529/// is simply not regressed against.
2530fn build_predictor_image_entropy(
2531 pixels: &[u32],
2532 width: u32,
2533 height: u32,
2534 size_bits: u8,
2535) -> (Vec<u32>, u32, u32) {
2536 let block = 1u32 << size_bits;
2537 let tw = predictor_div_round_up(width, block);
2538 let th = predictor_div_round_up(height, block);
2539 let mut img = Vec::with_capacity((tw * th) as usize);
2540 let w = width as usize;
2541 let h = height as usize;
2542 let bsz = block as usize;
2543 let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2544 for by in 0..th as usize {
2545 let mut left_mode: Option<u8> = None;
2546 for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2547 let x0 = bx * bsz;
2548 let y0 = by * bsz;
2549 let prefer = left_mode.or(*top_slot);
2550 let mode = pick_block_mode_with_hint_entropy(pixels, w, h, x0, y0, bsz, bsz, prefer);
2551 img.push(0xff00_0000 | ((mode as u32) << 8));
2552 left_mode = Some(mode);
2553 *top_slot = Some(mode);
2554 }
2555 }
2556 (img, tw, th)
2557}
2558
2559/// Round 162 — milli-bit Shannon delta for adding one occurrence of
2560/// `mode` to a running sub-image mode histogram with current counts
2561/// `hist[0..14]` and total `total`.
2562///
2563/// Returns `(N_new · H_new − N_old · H_old)` in milli-bits, where
2564/// `H = −Σ p·log2(p)` over the 14-bin mode distribution. This is the
2565/// **exact** marginal Shannon contribution of one extra `mode`
2566/// occurrence to the sub-image's symbol entropy mass — the same
2567/// `Σ c·log2(N/c)` form [`block_mode_entropy_cost`] uses, applied to
2568/// the sub-image's green-channel mode distribution rather than the
2569/// per-block residual byte histogram.
2570///
2571/// At the floor (`hist` all zero, `total == 0`) the delta is zero:
2572/// adding the first symbol moves the system from a degenerate
2573/// no-symbol state to a single-symbol histogram with `H = 0`. The
2574/// first **subsequent** occurrence of a *different* mode does grow
2575/// the mass (now two distinct symbols, total = 2 → `N·H = 2`). The
2576/// formula stays well-defined at every step because the post-add
2577/// histogram always has `total + 1 ≥ 1` and all bins with `c == 0`
2578/// are skipped from the sum.
2579///
2580/// Used by [`pick_block_mode_with_hint_entropy_subaware`] to charge a
2581/// per-block mode candidate not only for its own residual entropy
2582/// but also for its marginal contribution to the §7.2 predictor
2583/// sub-image's prefix-code mass — making the chooser sub-image-
2584/// aware in a way the round-159 hint and round-160 slack budget were
2585/// not (those mechanisms only acted on local neighbour identity,
2586/// without any global accounting of the sub-image's distribution
2587/// shape).
2588fn sub_image_mode_cost_delta_milli(hist: &[u32; 14], total: u32, mode: u8) -> u64 {
2589 debug_assert!(mode < 14);
2590 // Compute Σ c·log2(N/c) before and after; the delta is the
2591 // marginal Shannon mass in bits, scaled to milli-bits and
2592 // rounded to nearest u64. Float arithmetic is fine here for the
2593 // same reason as `block_mode_entropy_cost`: the rounding step
2594 // makes the result bit-for-bit deterministic across IEEE-754
2595 // log2 implementations to within ±1 milli-bit, which is finer
2596 // than any per-symbol cost ordering.
2597 let n_old = total as f64;
2598 let n_new = (total + 1) as f64;
2599 let log2_n_old = if total > 0 { n_old.log2() } else { 0.0 };
2600 let log2_n_new = n_new.log2();
2601 let mut mass_old: f64 = 0.0;
2602 let mut mass_new: f64 = 0.0;
2603 for (m, &c) in hist.iter().enumerate() {
2604 let c_after = if m == mode as usize { c + 1 } else { c };
2605 if c > 0 {
2606 let c_f = c as f64;
2607 mass_old += c_f * (log2_n_old - c_f.log2());
2608 }
2609 if c_after > 0 {
2610 let c_f = c_after as f64;
2611 mass_new += c_f * (log2_n_new - c_f.log2());
2612 }
2613 }
2614 let delta = (mass_new - mass_old).max(0.0);
2615 (delta * 1000.0 + 0.5) as u64
2616}
2617
2618/// Round 162 — *sub-image-aware* Shannon-entropy bit-cost variant of
2619/// [`pick_block_mode_with_hint_entropy`].
2620///
2621/// Picks the §4.1 mode minimising the **joint** cost
2622///
2623/// ```text
2624/// cost(m) = block_mode_entropy_cost(..., m)
2625/// + (lambda_milli * sub_image_mode_cost_delta_milli(hist, total, m)) / 1000
2626/// ```
2627///
2628/// where the first term is the per-block residual entropy (same
2629/// metric the round-161 chooser uses) and the second term is the
2630/// marginal §7.2 predictor sub-image cost — the bits the
2631/// `entropy-coded-image` writer will emit for this mode value given
2632/// the sub-image's running distribution shape. `lambda_milli` is the
2633/// per-sub-image-bit weight, in milli-units (so `lambda_milli = 1000`
2634/// weights one sub-image bit equal to one residual bit). Larger
2635/// lambda biases the chooser toward modes that reuse already-popular
2636/// values in the sub-image; `lambda_milli == 0` recovers the round-
2637/// 161 entropy-only chooser exactly (no sub-image weighting at all).
2638///
2639/// The round-159 strict tie-break hint is preserved: when
2640/// `prefer_mode = Some(m)` and `m`'s joint cost equals the chooser's
2641/// best, the chooser returns `m` so the sub-image keeps the longer
2642/// run of identical mode values. The hint check uses the same joint
2643/// cost (residual + lambda · sub-image delta) the main sweep uses,
2644/// so the tie semantics stay self-consistent.
2645///
2646/// Round-trip correctness is unaffected by the cost model choice:
2647/// the forward transform later re-derives residuals against the
2648/// chosen modes, and the decoder's inverse pass uses the same modes
2649/// from the sub-image, so the decoded image always equals the input.
2650///
2651/// The encoder protects itself from regressions by building both the
2652/// round-161 (sub-image-unaware) and round-162 (sub-image-aware at
2653/// multiple lambda values) predictor candidates and keeping the
2654/// shortest stream — so a fixture on which the sub-image weighting
2655/// hurts overall byte cost is simply not chosen.
2656#[allow(clippy::too_many_arguments)]
2657fn pick_block_mode_with_hint_entropy_subaware(
2658 pixels: &[u32],
2659 width: usize,
2660 height: usize,
2661 x0: usize,
2662 y0: usize,
2663 bw: usize,
2664 bh: usize,
2665 prefer_mode: Option<u8>,
2666 sub_image_hist: &[u32; 14],
2667 sub_image_total: u32,
2668 lambda_milli: u64,
2669) -> u8 {
2670 let mut best_mode: u8 = 0;
2671 let mut best_cost = u64::MAX;
2672 for mode in 0u8..=13 {
2673 let residual_cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, mode);
2674 let sub_delta = sub_image_mode_cost_delta_milli(sub_image_hist, sub_image_total, mode);
2675 // lambda_milli is "per-sub-image-bit weight in milli-units".
2676 // sub_delta is already in milli-bits. Multiply and divide by
2677 // 1000 to keep the whole expression in milli-bit units.
2678 let weighted_sub = sub_delta.saturating_mul(lambda_milli) / 1000;
2679 let cost = residual_cost.saturating_add(weighted_sub);
2680 if cost < best_cost {
2681 best_cost = cost;
2682 best_mode = mode;
2683 }
2684 }
2685 if let Some(m) = prefer_mode {
2686 if m != best_mode {
2687 let residual_cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, m);
2688 let sub_delta = sub_image_mode_cost_delta_milli(sub_image_hist, sub_image_total, m);
2689 let weighted_sub = sub_delta.saturating_mul(lambda_milli) / 1000;
2690 let cost = residual_cost.saturating_add(weighted_sub);
2691 if cost == best_cost {
2692 best_mode = m;
2693 }
2694 }
2695 }
2696 best_mode
2697}
2698
2699/// Round 162 *sub-image-aware* variant of
2700/// [`build_predictor_image_entropy`].
2701///
2702/// Identical structure to `build_predictor_image_entropy`, but routes
2703/// every per-block mode choice through
2704/// [`pick_block_mode_with_hint_entropy_subaware`] with a running
2705/// histogram of the sub-image's mode values chosen so far. `lambda_milli`
2706/// is the per-sub-image-bit weight (see
2707/// [`pick_block_mode_with_hint_entropy_subaware`] for the unit). The
2708/// round-159 strict-tie-break hint mechanism is preserved: the left
2709/// neighbour (or top neighbour at the left edge) is the preferred
2710/// mode on joint-cost-equal swaps.
2711///
2712/// `lambda_milli == 0` is byte-identical to
2713/// `build_predictor_image_entropy` (the sub-image term contributes
2714/// zero to every candidate). Larger `lambda_milli` biases the
2715/// chooser toward modes that reuse already-popular values in the
2716/// sub-image.
2717///
2718/// Round-trip correctness is unaffected: the decoder reads the
2719/// chosen modes from the sub-image; the forward transform recomputes
2720/// residuals against them. The chooser's joint-cost choice only
2721/// shifts which mode is recorded per block — never the decode
2722/// reconstruction path.
2723fn build_predictor_image_entropy_subaware(
2724 pixels: &[u32],
2725 width: u32,
2726 height: u32,
2727 size_bits: u8,
2728 lambda_milli: u64,
2729) -> (Vec<u32>, u32, u32) {
2730 let block = 1u32 << size_bits;
2731 let tw = predictor_div_round_up(width, block);
2732 let th = predictor_div_round_up(height, block);
2733 let mut img = Vec::with_capacity((tw * th) as usize);
2734 let w = width as usize;
2735 let h = height as usize;
2736 let bsz = block as usize;
2737 let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2738 let mut hist = [0u32; 14];
2739 let mut total: u32 = 0;
2740 for by in 0..th as usize {
2741 let mut left_mode: Option<u8> = None;
2742 for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2743 let x0 = bx * bsz;
2744 let y0 = by * bsz;
2745 let prefer = left_mode.or(*top_slot);
2746 let mode = pick_block_mode_with_hint_entropy_subaware(
2747 pixels,
2748 w,
2749 h,
2750 x0,
2751 y0,
2752 bsz,
2753 bsz,
2754 prefer,
2755 &hist,
2756 total,
2757 lambda_milli,
2758 );
2759 img.push(0xff00_0000 | ((mode as u32) << 8));
2760 left_mode = Some(mode);
2761 *top_slot = Some(mode);
2762 hist[mode as usize] += 1;
2763 total += 1;
2764 }
2765 }
2766 (img, tw, th)
2767}
2768
2769/// Round 305 — per-block predictor-mode selection strategy for the §4.1
2770/// predictor sub-image, used to parameterise which cost model the stacked
2771/// §3.5 transform chains build their predictor sub-image with.
2772///
2773/// The single-transform predictor path
2774/// ([`encode_argb_with_predictor_chooser`]) already sweeps every one of
2775/// these strategies and keeps the byte-shortest stream (rounds 159–162).
2776/// The stacked chains added in rounds 302–304
2777/// ([`encode_with_color_transform_predictor`],
2778/// [`encode_with_color_transform_subtract_green_predictor`],
2779/// [`encode_with_color_indexing_predictor`]) were bootstrapped with only
2780/// the [`PredictorSubImageStrategy::L1`] proxy chooser. Threading this
2781/// strategy through them lets the chooser try the entropy and
2782/// sub-image-aware entropy cost models over the *transform-decorrelated*
2783/// image those chains feed the predictor — exactly the residual the
2784/// predictor sub-image actually sees — and keep whichever is smallest.
2785///
2786/// Round-trip correctness is independent of the strategy: every variant
2787/// only changes *which §4.1 mode is recorded per block* in the sub-image;
2788/// the forward transform recomputes residuals against the chosen modes and
2789/// the decoder reads the same modes back, so the reconstruction is
2790/// bit-identical regardless of strategy. The chooser keeps the strategy
2791/// solely on a byte-cost basis, so a strategy that hurts on a given input
2792/// is simply not selected — the path strictly extends the encoder's option
2793/// set without ever regressing the L1 baseline.
2794#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2795enum PredictorSubImageStrategy {
2796 /// Round-159 folded-L1 magnitude proxy chooser
2797 /// ([`build_predictor_image`]).
2798 L1,
2799 /// Round-161 Shannon-entropy bit-cost chooser
2800 /// ([`build_predictor_image_entropy`]).
2801 Entropy,
2802 /// Round-162 sub-image-aware Shannon-entropy chooser
2803 /// ([`build_predictor_image_entropy_subaware`]) at the given
2804 /// `lambda_milli` per-sub-image-bit weight.
2805 EntropySubaware { lambda_milli: u64 },
2806}
2807
2808/// Round 305 — build a §4.1 predictor sub-image under the given
2809/// [`PredictorSubImageStrategy`]. Dispatches to the round-159 / round-161 /
2810/// round-162 builders, all of which share the
2811/// `(pixels, width, height, size_bits) -> (sub_image, tw, th)` shape, so
2812/// the stacked chains can pick a cost model uniformly. See
2813/// [`PredictorSubImageStrategy`] for the round-trip invariance argument.
2814fn build_predictor_image_strategy(
2815 pixels: &[u32],
2816 width: u32,
2817 height: u32,
2818 size_bits: u8,
2819 strategy: PredictorSubImageStrategy,
2820) -> (Vec<u32>, u32, u32) {
2821 match strategy {
2822 PredictorSubImageStrategy::L1 => build_predictor_image(pixels, width, height, size_bits),
2823 PredictorSubImageStrategy::Entropy => {
2824 build_predictor_image_entropy(pixels, width, height, size_bits)
2825 }
2826 PredictorSubImageStrategy::EntropySubaware { lambda_milli } => {
2827 build_predictor_image_entropy_subaware(pixels, width, height, size_bits, lambda_milli)
2828 }
2829 }
2830}
2831
2832/// Round 306 — the predictor-sub-image strategies the stacked §3.5 chains
2833/// sweep. The L1 proxy (the rounds 302–304 baseline) leads so a tie keeps
2834/// the historical choice; the round-161 plain entropy chooser follows; then
2835/// the round-162 sub-image-aware entropy chooser across the **full lambda
2836/// sweep** the single-transform predictor path
2837/// ([`encode_argb_with_predictor_chooser`]) has carried since round 162.
2838///
2839/// Round 305 bootstrapped the stacked chains with only a single mid-range
2840/// sub-image-aware lambda (`16_000`); the single-transform path instead
2841/// sweeps four weights — `4_000` / `16_000` / `64_000` / `256_000`
2842/// milli-per-bit — straddling the empirically-observed residual-vs-
2843/// sub-image cost crossover (~`64_000`) on smooth transform-decorrelated
2844/// content. Below the crossover the residual cost dominates and a low
2845/// lambda barely perturbs the round-161 choice; above it the §7.2
2846/// sub-image's prefix-code mass dominates and a high lambda converges the
2847/// mode set into longer runs, shrinking the sub-image header. Threading the
2848/// same four weights through the stacked chains lets each chain land on the
2849/// crossover its own *transform-decorrelated* residual exhibits rather than
2850/// the one fixed mid-range guess.
2851///
2852/// The chooser keeps the byte-shortest stream across all six strategies, so
2853/// the wider sweep is strictly non-regressing against both the L1 baseline
2854/// and the round-305 single-lambda setting. Round-trip output is unchanged
2855/// by the strategy: lambda only biases *which §4.1 mode is recorded* per
2856/// block; the forward transform recomputes residuals against the chosen
2857/// modes and the decoder reads the same modes back.
2858const STACKED_PREDICTOR_STRATEGIES: [PredictorSubImageStrategy; 6] = [
2859 PredictorSubImageStrategy::L1,
2860 PredictorSubImageStrategy::Entropy,
2861 PredictorSubImageStrategy::EntropySubaware {
2862 lambda_milli: 4_000,
2863 },
2864 PredictorSubImageStrategy::EntropySubaware {
2865 lambda_milli: 16_000,
2866 },
2867 PredictorSubImageStrategy::EntropySubaware {
2868 lambda_milli: 64_000,
2869 },
2870 PredictorSubImageStrategy::EntropySubaware {
2871 lambda_milli: 256_000,
2872 },
2873];
2874
2875/// Apply the §4.1 *forward* predictor transform: for each pixel,
2876/// replace it with the per-channel mod-256 residual `(original -
2877/// pred)`. `pred` is computed from the **source** (un-modified)
2878/// pixels — see [`predictor_at`] — so the decoder's inverse pass
2879/// (which uses already-reconstructed pixels equal to those source
2880/// pixels) recovers the originals exactly.
2881///
2882/// Writes residuals into `dst` (`width * height` long). `src` is
2883/// the un-modified source. `predictor_image` / `transform_width` /
2884/// `size_bits` describe the sub-resolution mode image. Per §4.1's
2885/// border rules the top-left predicts solid black, the top row
2886/// predicts L, the left column predicts T, the rightmost column
2887/// uses the row's leftmost pixel as TR; interior pixels read their
2888/// mode from the predictor image's green channel.
2889fn apply_forward_predictor(
2890 src: &[u32],
2891 dst: &mut [u32],
2892 width: u32,
2893 height: u32,
2894 predictor_image: &[u32],
2895 transform_width: u32,
2896 size_bits: u8,
2897) {
2898 if width == 0 || height == 0 {
2899 return;
2900 }
2901 let w = width as usize;
2902 let h = height as usize;
2903 for y in 0..h {
2904 for x in 0..w {
2905 let idx = y * w + x;
2906 // Interior pixels read their block mode from the
2907 // sub-resolution predictor image; border rules in
2908 // `predictor_at` ignore the mode for top-row /
2909 // left-column / top-left pixels.
2910 let mode = if x == 0 || y == 0 {
2911 0
2912 } else {
2913 let bx = (x as u32) >> size_bits;
2914 let by = (y as u32) >> size_bits;
2915 let block_index = (by * transform_width + bx) as usize;
2916 ((predictor_image[block_index] >> 8) & 0xff) as u8
2917 };
2918 let pred = predictor_at(src, w, x, y, mode);
2919 dst[idx] = predictor_subtract(src[idx], pred);
2920 }
2921 }
2922}
2923
2924/// Default §4.1 `size_bits` value the encoder picks for the
2925/// predictor sub-image: `4` → 16×16 pixel blocks. Smaller blocks
2926/// give finer mode granularity (better residual savings) at the
2927/// cost of a larger predictor sub-image (4× the entries for each
2928/// `size_bits` decrement). 16×16 is a reasonable middle ground for
2929/// the typical encoder workloads here; the spec admits `2..=9`
2930/// (`block` sizes 4..=512). As of round 155 the chooser also
2931/// evaluates a maximal single-block candidate by promoting
2932/// `size_bits` until `1 << size_bits ≥ max(width, height)`, so the
2933/// default value here only sets the per-region granularity floor;
2934/// see [`encode_argb_with_predictor_chooser`].
2935const DEFAULT_PREDICTOR_SIZE_BITS: u8 = 4;
2936
2937/// Encode `pixels` taking the §4.1 spatial predictor path: pick a
2938/// per-block predictor mode minimising the residual magnitude,
2939/// transform the pixels to residuals, then encode the residuals via
2940/// the standard `spatially-coded-image` shape — wrapped by an
2941/// `optional-transform` whose first entry is the §4.1 predictor
2942/// transform (header bit `%b1` + transform type `Predictor = 0` +
2943/// 3-bit `size_bits - 2` + the sub-resolution predictor image as an
2944/// `entropy-coded-image`).
2945///
2946/// The chooser composes with `cache_code_bits`: when `Some(bits)` a
2947/// §5.2.3 color cache of that size is built over the residual
2948/// stream's literal tokens.
2949///
2950/// **NB:** the predictor transform requires at least a 2-pixel
2951/// dimension on the side being predicted (a 1-pixel image triggers
2952/// the §4.1 top-left-only border rule, so the transform body cannot
2953/// produce a meaningful residual). The caller should fall back to
2954/// the no-transform candidate for trivially small images.
2955fn encode_with_predictor(
2956 pixels: &[u32],
2957 width: u32,
2958 height: u32,
2959 size_bits: u8,
2960 cache_code_bits: Option<u32>,
2961 image_width: u32,
2962) -> Vec<u8> {
2963 let mut w = BitWriter::new();
2964
2965 // ---- §3.8.2 / §7.2 optional-transform: predictor-tx ----
2966 // present bit `%b1`.
2967 w.write_bit(true);
2968 // transform type `Predictor = 0`, 2 bits.
2969 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
2970 // 3-bit `size_bits - 2` (decoder adds 2 back per §4.1).
2971 debug_assert!((2..=9).contains(&size_bits));
2972 w.write_bits((size_bits - 2) as u32, 3);
2973
2974 // Build the sub-resolution predictor image then write it as an
2975 // entropy-coded-image per §7.2 `predictor-image = 3BIT
2976 // entropy-coded-image`.
2977 let (predictor_image, tw, _th) = build_predictor_image(pixels, width, height, size_bits);
2978 write_entropy_coded_image_literals(&mut w, &predictor_image);
2979
2980 // End of optional-transform list (`%b0`).
2981 w.write_bit(false);
2982
2983 // ---- Forward-transform the main image into residuals ----
2984 let mut residuals = vec![0u32; pixels.len()];
2985 apply_forward_predictor(
2986 pixels,
2987 &mut residuals,
2988 width,
2989 height,
2990 &predictor_image,
2991 tw,
2992 size_bits,
2993 );
2994
2995 // ---- Tokenise + emit the residual spatially-coded-image ----
2996 let mut tokens = tokenize_lz77(&residuals);
2997 if let Some(bits) = cache_code_bits {
2998 tokens = cacheify_tokens(&tokens, &residuals, bits);
2999 }
3000 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3001
3002 w.into_bytes()
3003}
3004
3005/// Round-160 *slack-cost* variant of [`encode_with_predictor`].
3006///
3007/// Same wire shape as `encode_with_predictor`, but the §4.1
3008/// predictor sub-image is built via
3009/// [`build_predictor_image_with_slack`] with the caller-supplied
3010/// `slack` budget. `slack == 0` produces a byte-identical stream
3011/// to `encode_with_predictor`.
3012///
3013/// `slack > 0` permits the chooser to swap to the preferred
3014/// neighbour mode at a small residual-cost increase, with the goal
3015/// of dropping the predictor sub-image's symbol entropy. The
3016/// chooser at [`encode_argb_with_predictor_chooser`] always
3017/// compares the slack candidates against `slack == 0`, so a slack
3018/// budget that hurts overall byte cost on a given input is
3019/// non-selecting (the strict candidate wins on byte length).
3020fn encode_with_predictor_slack(
3021 pixels: &[u32],
3022 width: u32,
3023 height: u32,
3024 size_bits: u8,
3025 cache_code_bits: Option<u32>,
3026 image_width: u32,
3027 slack: u64,
3028) -> Vec<u8> {
3029 let mut w = BitWriter::new();
3030
3031 w.write_bit(true);
3032 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3033 debug_assert!((2..=9).contains(&size_bits));
3034 w.write_bits((size_bits - 2) as u32, 3);
3035
3036 let (predictor_image, tw, _th) =
3037 build_predictor_image_with_slack(pixels, width, height, size_bits, slack);
3038 write_entropy_coded_image_literals(&mut w, &predictor_image);
3039
3040 w.write_bit(false);
3041
3042 let mut residuals = vec![0u32; pixels.len()];
3043 apply_forward_predictor(
3044 pixels,
3045 &mut residuals,
3046 width,
3047 height,
3048 &predictor_image,
3049 tw,
3050 size_bits,
3051 );
3052
3053 let mut tokens = tokenize_lz77(&residuals);
3054 if let Some(bits) = cache_code_bits {
3055 tokens = cacheify_tokens(&tokens, &residuals, bits);
3056 }
3057 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3058
3059 w.into_bytes()
3060}
3061
3062/// Round-161 *Shannon-entropy bit-cost* variant of
3063/// [`encode_with_predictor`].
3064///
3065/// Same wire shape as `encode_with_predictor`, but the §4.1
3066/// predictor sub-image is built via [`build_predictor_image_entropy`]
3067/// — replacing the per-block L1-magnitude proxy with a true Huffman
3068/// lower-bound bit cost on the per-channel residual histogram. The
3069/// chooser hint mechanism (strict tie-break favouring the
3070/// neighbour's mode) is preserved.
3071///
3072/// `encode_argb_with_predictor_chooser` always compares this
3073/// candidate against the L1-proxy candidates (round-159 strict tie-
3074/// break and round-160 slack variants), so on fixtures where the L1
3075/// proxy genuinely wins, the entropy candidate is non-selecting.
3076fn encode_with_predictor_entropy(
3077 pixels: &[u32],
3078 width: u32,
3079 height: u32,
3080 size_bits: u8,
3081 cache_code_bits: Option<u32>,
3082 image_width: u32,
3083) -> Vec<u8> {
3084 let mut w = BitWriter::new();
3085
3086 w.write_bit(true);
3087 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3088 debug_assert!((2..=9).contains(&size_bits));
3089 w.write_bits((size_bits - 2) as u32, 3);
3090
3091 let (predictor_image, tw, _th) =
3092 build_predictor_image_entropy(pixels, width, height, size_bits);
3093 write_entropy_coded_image_literals(&mut w, &predictor_image);
3094
3095 w.write_bit(false);
3096
3097 let mut residuals = vec![0u32; pixels.len()];
3098 apply_forward_predictor(
3099 pixels,
3100 &mut residuals,
3101 width,
3102 height,
3103 &predictor_image,
3104 tw,
3105 size_bits,
3106 );
3107
3108 let mut tokens = tokenize_lz77(&residuals);
3109 if let Some(bits) = cache_code_bits {
3110 tokens = cacheify_tokens(&tokens, &residuals, bits);
3111 }
3112 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3113
3114 w.into_bytes()
3115}
3116
3117/// Round 162 — *sub-image-aware* Shannon-entropy bit-cost predictor
3118/// path. Identical to [`encode_with_predictor_entropy`] but routes
3119/// the sub-image construction through
3120/// [`build_predictor_image_entropy_subaware`] with `lambda_milli` as
3121/// the per-sub-image-bit weight for the joint cost.
3122///
3123/// `lambda_milli == 0` is byte-identical to
3124/// [`encode_with_predictor_entropy`] (the sub-image term contributes
3125/// zero to every per-block choice, so the chooser falls back to the
3126/// round-161 entropy chooser).
3127///
3128/// `encode_argb_with_predictor_chooser` always compares the round-
3129/// 162 candidates (multiple lambda settings) against every round-159
3130/// / round-160 / round-161 candidate, so on fixtures where sub-
3131/// image weighting hurts overall byte cost, the round-162 candidate
3132/// is non-selecting and the path strictly extends the encoder's
3133/// option set rather than redirecting it.
3134fn encode_with_predictor_entropy_subaware(
3135 pixels: &[u32],
3136 width: u32,
3137 height: u32,
3138 size_bits: u8,
3139 cache_code_bits: Option<u32>,
3140 image_width: u32,
3141 lambda_milli: u64,
3142) -> Vec<u8> {
3143 let mut w = BitWriter::new();
3144
3145 w.write_bit(true);
3146 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3147 debug_assert!((2..=9).contains(&size_bits));
3148 w.write_bits((size_bits - 2) as u32, 3);
3149
3150 let (predictor_image, tw, _th) =
3151 build_predictor_image_entropy_subaware(pixels, width, height, size_bits, lambda_milli);
3152 write_entropy_coded_image_literals(&mut w, &predictor_image);
3153
3154 w.write_bit(false);
3155
3156 let mut residuals = vec![0u32; pixels.len()];
3157 apply_forward_predictor(
3158 pixels,
3159 &mut residuals,
3160 width,
3161 height,
3162 &predictor_image,
3163 tw,
3164 size_bits,
3165 );
3166
3167 let mut tokens = tokenize_lz77(&residuals);
3168 if let Some(bits) = cache_code_bits {
3169 tokens = cacheify_tokens(&tokens, &residuals, bits);
3170 }
3171 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3172
3173 w.into_bytes()
3174}
3175
3176// ---- §3.5.2 / §4.2 forward color-transform encoder ------------------
3177
3178/// §3.5.2 `ColorTransformDelta(t, c)` = `(int8(t) * int8(c)) >> 5`,
3179/// with `t` and `c` interpreted as signed 8-bit two's-complement values.
3180/// Identical formula to the decoder's
3181/// [`crate::vp8l_transform::color_transform_delta`] — kept local so this
3182/// module compiles under `--no-default-features` (which the decoder also
3183/// satisfies, but the helper is `pub(crate)`-private to that file).
3184///
3185/// Only the low 8 bits of the result are meaningful per §3.5.2
3186/// ("only the lowest 8 bits are used from the result"); the wider `i32`
3187/// return type lets callers fold it into a signed pixel computation
3188/// before masking.
3189#[inline]
3190fn color_xfrm_delta(t: u8, c: u8) -> i32 {
3191 let ts = t as i8 as i32;
3192 let cs = c as i8 as i32;
3193 (ts * cs) >> 5
3194}
3195
3196/// §3.5.2 *forward* color-transform on one pixel.
3197///
3198/// Subtracts the three color-transform deltas from `red` and `blue`
3199/// (green is untouched per §3.5.2). The arguments mirror the §3.5.2
3200/// `ColorTransform()` C signature: the per-block element is unpacked
3201/// into `(green_to_red, green_to_blue, red_to_blue)`. Returns the
3202/// encoded `(new_red, new_blue)` as low 8-bit residuals. The §3.5.2
3203/// red argument to the third delta is the *original* `red` (not the
3204/// post-green-to-red residual), matching the spec's encoder pseudo-
3205/// code; the decoder's inverse adds the same delta back using its
3206/// reconstructed `tmp_red & 0xff`, which by symmetry equals the
3207/// original red, so the round-trip is bit-exact.
3208#[inline]
3209fn forward_color_pixel(
3210 r: u8,
3211 g: u8,
3212 b: u8,
3213 green_to_red: u8,
3214 green_to_blue: u8,
3215 red_to_blue: u8,
3216) -> (u8, u8) {
3217 let mut tmp_red = r as i32;
3218 let mut tmp_blue = b as i32;
3219 tmp_red -= color_xfrm_delta(green_to_red, g);
3220 tmp_blue -= color_xfrm_delta(green_to_blue, g);
3221 tmp_blue -= color_xfrm_delta(red_to_blue, r);
3222 ((tmp_red & 0xff) as u8, (tmp_blue & 0xff) as u8)
3223}
3224
3225/// §3.5.2 color-transform candidate values swept by [`pick_block_cte`]
3226/// for each of the three `(green_to_red, green_to_blue, red_to_blue)`
3227/// axes.
3228///
3229/// Each value is an 8-bit two's-complement integer. With the §3.5.2
3230/// fixed-point interpretation (`>> 5` divides by 32), a value of 32
3231/// corresponds to a slope of 1 in the corresponding channel; the
3232/// listed entries span `[-96, 96]` with fine resolution `±4` near
3233/// zero (where most natural-image channel correlations sit, e.g. a
3234/// slope of 1/3 ≈ 10.7 fixed-point) coarsening to `±16` further out.
3235/// Including 0 ("no transform") guarantees the per-axis chooser never
3236/// picks a CTE worse than the no-correlation baseline on that axis.
3237///
3238/// 25 candidates × 3 axes = 75 cost evaluations per block (with the
3239/// per-axis greedy in `pick_block_cte` being exact because the cost
3240/// decomposes additively across the red and blue channels — green is
3241/// untouched, the red channel depends only on `green_to_red`, and the
3242/// blue channel depends additively on `(green_to_blue, red_to_blue)`).
3243const CTE_AXIS_CANDIDATES: [u8; 25] = [
3244 0xa0, // -96
3245 0xb0, // -80
3246 0xc0, // -64
3247 0xd0, // -48
3248 0xe0, // -32
3249 0xe8, // -24
3250 0xec, // -20
3251 0xf0, // -16
3252 0xf4, // -12
3253 0xf8, // -8
3254 0xfc, // -4
3255 0xfe, // -2
3256 0x00, // 0
3257 0x02, // 2
3258 0x04, // 4
3259 0x08, // 8
3260 0x0c, // 12
3261 0x10, // 16
3262 0x14, // 20
3263 0x18, // 24
3264 0x20, // 32
3265 0x30, // 48
3266 0x40, // 64
3267 0x50, // 80
3268 0x60, // 96
3269];
3270
3271/// Per-channel folded-magnitude cost: same residual-magnitude proxy
3272/// [`residual_magnitude`] uses for the §4.1 predictor, but on a single
3273/// 8-bit channel — `min(v, 256 - v)`. Lower magnitudes peak the
3274/// histogram near zero, which the per-channel Huffman codes compress
3275/// better.
3276#[inline]
3277fn channel_magnitude(v: u32) -> u32 {
3278 let v = v & 0xff;
3279 if v <= 128 {
3280 v
3281 } else {
3282 256 - v
3283 }
3284}
3285
3286/// §3.5.2: pick the `(green_to_red, green_to_blue, red_to_blue)`
3287/// element that minimises the residual-magnitude cost on the
3288/// rectangular block `[x0, x0+bw) × [y0, y0+bh)` of the
3289/// `width × height` image.
3290///
3291/// The cost decomposes additively across channels (green is untouched
3292/// by §3.5.2, red depends only on `green_to_red`, blue depends on
3293/// `green_to_blue + red_to_blue`), so a per-axis greedy sweep over
3294/// [`CTE_AXIS_CANDIDATES`] is exact:
3295///
3296/// 1. For each `gtr` candidate, sum `|red - delta(gtr, green)| & 0xff`
3297/// folded onto `[-128, 127]` over the block's pixels; keep the
3298/// smallest.
3299/// 2. For each `gtb` candidate, sum
3300/// `|blue - delta(gtb, green)| & 0xff` folded similarly.
3301/// 3. For each `rtb` candidate, sum
3302/// `|(blue - delta(best_gtb, green)) - delta(rtb, red)| & 0xff`.
3303///
3304/// On ties the candidate appearing earlier in
3305/// [`CTE_AXIS_CANDIDATES`] wins, which makes the chooser deterministic.
3306///
3307/// Public so the `pick_block_cte` criterion bench can drive the
3308/// chooser walk directly (same shelf as [`predictor_subtract`] /
3309/// [`apply_subtract_green`]); encoder callers go through
3310/// `build_color_image`.
3311pub fn pick_block_cte(
3312 pixels: &[u32],
3313 width: usize,
3314 height: usize,
3315 x0: usize,
3316 y0: usize,
3317 bw: usize,
3318 bh: usize,
3319) -> (u8, u8, u8) {
3320 // Gather the block's per-pixel channel triples once.
3321 let mut samples: Vec<(u8, u8, u8)> = Vec::with_capacity(bw * bh);
3322 for dy in 0..bh {
3323 let y = y0 + dy;
3324 if y >= height {
3325 break;
3326 }
3327 for dx in 0..bw {
3328 let x = x0 + dx;
3329 if x >= width {
3330 break;
3331 }
3332 let px = pixels[y * width + x];
3333 let r = ((px >> 16) & 0xff) as u8;
3334 let g = ((px >> 8) & 0xff) as u8;
3335 let b = (px & 0xff) as u8;
3336 samples.push((r, g, b));
3337 }
3338 }
3339 if samples.is_empty() {
3340 return (0, 0, 0);
3341 }
3342
3343 // Axis 1: green → red. The red residual is
3344 // `(red - delta(gtr, green)) & 0xff`, independent of gtb and rtb.
3345 let best_gtr = sweep_cte_axis(&samples, |gtr, r, g, _b| {
3346 channel_magnitude((r as i32 - color_xfrm_delta(gtr, g)) as u32)
3347 });
3348
3349 // Axis 2: green → blue. The intermediate blue residual is
3350 // `(blue - delta(gtb, green)) & 0xff`, independent of rtb. We
3351 // evaluate the GREEN→BLUE contribution alone here; the joint
3352 // (gtb, rtb) choice is exact because the red-to-blue delta is
3353 // additive in `rtb` and depends only on the original red.
3354 let best_gtb = sweep_cte_axis(&samples, |gtb, _r, g, b| {
3355 channel_magnitude((b as i32 - color_xfrm_delta(gtb, g)) as u32)
3356 });
3357
3358 // Axis 3: red → blue. Fold the now-fixed green→blue delta into
3359 // each pixel's intermediate blue, then sweep rtb.
3360 let best_rtb = sweep_cte_axis(&samples, |rtb, r, g, b| {
3361 let inter = b as i32 - color_xfrm_delta(best_gtb, g);
3362 channel_magnitude((inter - color_xfrm_delta(rtb, r)) as u32)
3363 });
3364
3365 (best_gtr, best_gtb, best_rtb)
3366}
3367
3368/// One per-axis greedy sweep of [`pick_block_cte`]: evaluate every
3369/// [`CTE_AXIS_CANDIDATES`] entry's summed per-sample cost and return
3370/// the candidate with the smallest sum (earliest entry wins ties).
3371///
3372/// The prune that used to run per sample (`cost >= best` → abandon
3373/// the candidate) is checked at [`CTE_PRUNE_CHUNK`]-sample
3374/// granularity instead, the same despecialisation the round-280
3375/// §4.1 chooser walker applied at block-row granularity: the
3376/// interior chunk loop carries no data-dependent exit, so the
3377/// monomorphised `cost` closure body auto-vectorises. Pick-identical
3378/// by the round-280 argument — per-sample contributions are
3379/// non-negative, so a partial sum reaching `>= best` implies the
3380/// full sum also compares `>= best`, and a candidate that now runs
3381/// to completion instead of pruning yields its exact full sum, which
3382/// is still `>= best` and therefore still loses; completed sums and
3383/// the strict-`<` tie-break are unchanged. Worst case is one extra
3384/// chunk of work per pruned candidate.
3385///
3386/// The per-chunk partial fits `u32`: each [`channel_magnitude`] is
3387/// `<= 128`, so a chunk sums to `<= 128 * CTE_PRUNE_CHUNK`.
3388#[inline]
3389fn sweep_cte_axis(samples: &[(u8, u8, u8)], cost_of: impl Fn(u8, u8, u8, u8) -> u32) -> u8 {
3390 let mut best: u8 = 0;
3391 let mut best_cost = u64::MAX;
3392 for &cand in &CTE_AXIS_CANDIDATES {
3393 let mut cost = 0u64;
3394 for chunk in samples.chunks(CTE_PRUNE_CHUNK) {
3395 let mut partial = 0u32;
3396 for &(r, g, b) in chunk {
3397 partial += cost_of(cand, r, g, b);
3398 }
3399 cost += partial as u64;
3400 if cost >= best_cost {
3401 break;
3402 }
3403 }
3404 if cost < best_cost {
3405 best_cost = cost;
3406 best = cand;
3407 }
3408 }
3409 best
3410}
3411
3412/// Sample granularity of the [`sweep_cte_axis`] prune check. 32
3413/// samples is two 16-pixel block rows at the encoder-default
3414/// `size_bits = 4` — small enough that a hopeless candidate is
3415/// abandoned after ~12% of a 16×16 block, large enough for the
3416/// branch-free interior loop to amortise the check.
3417const CTE_PRUNE_CHUNK: usize = 32;
3418
3419/// Cost model the §4.2 per-block color-transform-element chooser uses
3420/// to compare [`CTE_AXIS_CANDIDATES`] on each axis.
3421///
3422/// The two strategies sweep the *same* candidate grid with the *same*
3423/// per-axis greedy decomposition — only the per-axis scoring differs:
3424///
3425/// * [`ColorTransformStrategy::L1`] sums the folded per-channel
3426/// residual magnitude ([`channel_magnitude`]) over the block — the
3427/// round-147 proxy `pick_block_cte` has carried since the color
3428/// transform landed.
3429/// * [`ColorTransformStrategy::Entropy`] scores each candidate by the
3430/// Shannon lower-bound bit cost of the resulting per-channel
3431/// residual histogram ([`channel_residual_entropy_milli`]) — the
3432/// §4.2 analogue of the round-161 §4.1 predictor entropy chooser.
3433/// RFC 9649 §3.5 authorises the choice ("transform data can be
3434/// decided based on entropy minimization"); the entropy cost is the
3435/// metric the §5.x per-channel prefix codes actually minimise, so it
3436/// distinguishes a near-zero-with-outliers residual (low L1, but the
3437/// outliers force long codes) from a concentrated spread (slightly
3438/// higher L1, but a cheaper histogram) where the L1 proxy cannot.
3439///
3440/// The per-axis greedy stays exact under either model because the red
3441/// channel depends only on `green_to_red`, the blue channel depends
3442/// only on `(green_to_blue, red_to_blue)`, and red / blue carry
3443/// independent §5.x prefix codes — so red entropy minimises over
3444/// `green_to_red` alone, and the blue pair is chosen greedily
3445/// (`green_to_blue` first, then `red_to_blue` folding in the fixed
3446/// `green_to_blue` delta) exactly as the L1 path already does.
3447#[derive(Clone, Copy, PartialEq, Eq)]
3448enum ColorTransformStrategy {
3449 L1,
3450 Entropy,
3451}
3452
3453/// Shannon lower-bound bit cost (in milli-bits, rounded to nearest) of
3454/// a single 8-bit residual channel's 256-bin histogram.
3455///
3456/// `Σ_b c·log2(N/c)` with the same `log2(N) − log2(c)` expansion and
3457/// nearest-milli-bit rounding [`block_mode_entropy_cost`] uses, so the
3458/// §4.2 entropy chooser is byte-deterministic on the same terms as the
3459/// §4.1 predictor entropy chooser.
3460fn channel_residual_entropy_milli(hist: &[u32; 256]) -> u64 {
3461 let n: u32 = hist.iter().sum();
3462 if n == 0 {
3463 return 0;
3464 }
3465 let n_f = n as f64;
3466 let log2_n = n_f.log2();
3467 let mut milli_bits: f64 = 0.0;
3468 for &count in hist.iter() {
3469 if count == 0 {
3470 continue;
3471 }
3472 let c_f = count as f64;
3473 milli_bits += c_f * (log2_n - c_f.log2());
3474 }
3475 (milli_bits * 1000.0 + 0.5) as u64
3476}
3477
3478/// Entropy-cost analogue of [`sweep_cte_axis`]: pick the
3479/// [`CTE_AXIS_CANDIDATES`] entry whose resulting residual histogram
3480/// has the smallest [`channel_residual_entropy_milli`].
3481///
3482/// `residual_of` maps `(candidate, r, g, b)` to the post-transform
3483/// 8-bit residual the candidate produces for the sample, exactly as
3484/// the L1 closures do; here it feeds a histogram rather than a folded
3485/// magnitude. Earliest entry wins ties, matching [`sweep_cte_axis`],
3486/// so the two strategies share a tie-break rule.
3487#[inline]
3488fn sweep_cte_axis_entropy(
3489 samples: &[(u8, u8, u8)],
3490 residual_of: impl Fn(u8, u8, u8, u8) -> u8,
3491) -> u8 {
3492 let mut best: u8 = 0;
3493 let mut best_cost = u64::MAX;
3494 for &cand in &CTE_AXIS_CANDIDATES {
3495 let mut hist = [0u32; 256];
3496 for &(r, g, b) in samples {
3497 hist[residual_of(cand, r, g, b) as usize] += 1;
3498 }
3499 let cost = channel_residual_entropy_milli(&hist);
3500 if cost < best_cost {
3501 best_cost = cost;
3502 best = cand;
3503 }
3504 }
3505 best
3506}
3507
3508/// §3.5.2 entropy-cost color-transform-element chooser — the
3509/// [`ColorTransformStrategy::Entropy`] counterpart of [`pick_block_cte`].
3510///
3511/// Same per-axis greedy and same residual decomposition as the L1
3512/// chooser; only the per-axis scoring is the Shannon histogram bit
3513/// cost. Returns `(green_to_red, green_to_blue, red_to_blue)`.
3514fn pick_block_cte_entropy(
3515 pixels: &[u32],
3516 width: usize,
3517 height: usize,
3518 x0: usize,
3519 y0: usize,
3520 bw: usize,
3521 bh: usize,
3522) -> (u8, u8, u8) {
3523 let mut samples: Vec<(u8, u8, u8)> = Vec::with_capacity(bw * bh);
3524 for dy in 0..bh {
3525 let y = y0 + dy;
3526 if y >= height {
3527 break;
3528 }
3529 for dx in 0..bw {
3530 let x = x0 + dx;
3531 if x >= width {
3532 break;
3533 }
3534 let px = pixels[y * width + x];
3535 let r = ((px >> 16) & 0xff) as u8;
3536 let g = ((px >> 8) & 0xff) as u8;
3537 let b = (px & 0xff) as u8;
3538 samples.push((r, g, b));
3539 }
3540 }
3541 if samples.is_empty() {
3542 return (0, 0, 0);
3543 }
3544
3545 // Axis 1: green → red residual `(red - delta(gtr, green)) & 0xff`.
3546 let best_gtr = sweep_cte_axis_entropy(&samples, |gtr, r, g, _b| {
3547 ((r as i32 - color_xfrm_delta(gtr, g)) & 0xff) as u8
3548 });
3549 // Axis 2: green → blue intermediate residual.
3550 let best_gtb = sweep_cte_axis_entropy(&samples, |gtb, _r, g, b| {
3551 ((b as i32 - color_xfrm_delta(gtb, g)) & 0xff) as u8
3552 });
3553 // Axis 3: red → blue, folding the fixed green→blue delta in first.
3554 let best_rtb = sweep_cte_axis_entropy(&samples, |rtb, r, g, b| {
3555 let inter = b as i32 - color_xfrm_delta(best_gtb, g);
3556 ((inter - color_xfrm_delta(rtb, r)) & 0xff) as u8
3557 });
3558
3559 (best_gtr, best_gtb, best_rtb)
3560}
3561
3562/// Build the §3.5.2 sub-resolution *color image*: one ARGB pixel per
3563/// `(1 << size_bits)`-pixel-square block of the main image, with the
3564/// chosen [`ColorTransformElement`] packed per §3.5.2 ("each
3565/// `ColorTransformElement` 'cte' is treated as a pixel in a
3566/// subresolution image whose alpha component is 255, red component is
3567/// `cte.red_to_blue`, green component is `cte.green_to_blue`, and
3568/// blue component is `cte.green_to_red`").
3569///
3570/// Returns `(color_image, transform_width, transform_height)`. The
3571/// dimensions follow the §4.2 `DIV_ROUND_UP` rule, identical to the
3572/// §4.1 predictor image's.
3573fn build_color_image(
3574 pixels: &[u32],
3575 width: u32,
3576 height: u32,
3577 size_bits: u8,
3578 strategy: ColorTransformStrategy,
3579) -> (Vec<u32>, u32, u32) {
3580 let block = 1u32 << size_bits;
3581 let tw = predictor_div_round_up(width, block);
3582 let th = predictor_div_round_up(height, block);
3583 let mut img = Vec::with_capacity((tw * th) as usize);
3584 let w = width as usize;
3585 let h = height as usize;
3586 let bsz = block as usize;
3587 for by in 0..th as usize {
3588 for bx in 0..tw as usize {
3589 let x0 = bx * bsz;
3590 let y0 = by * bsz;
3591 let (gtr, gtb, rtb) = match strategy {
3592 ColorTransformStrategy::L1 => pick_block_cte(pixels, w, h, x0, y0, bsz, bsz),
3593 ColorTransformStrategy::Entropy => {
3594 pick_block_cte_entropy(pixels, w, h, x0, y0, bsz, bsz)
3595 }
3596 };
3597 // Pack the CTE into one ARGB pixel exactly as §3.5.2
3598 // specifies: alpha=255, red=red_to_blue, green=green_to_blue,
3599 // blue=green_to_red. The decoder unpacks it in
3600 // `crate::vp8l_transform::inverse_color` via the same
3601 // channel-name mapping.
3602 let argb = 0xff00_0000 | ((rtb as u32) << 16) | ((gtb as u32) << 8) | (gtr as u32);
3603 img.push(argb);
3604 }
3605 }
3606 (img, tw, th)
3607}
3608
3609/// Apply the §3.5.2 *forward* color transform: for each pixel, look up
3610/// the per-block element from `color_image` (with the §3.5.2 channel
3611/// layout) and rewrite the red and blue channels via
3612/// [`forward_color_pixel`]. Green and alpha are passed through.
3613///
3614/// Writes the transformed pixels into `dst` (`width * height` long).
3615/// `src` is the un-modified source; the encoder transforms against the
3616/// originals because the decoder reconstructs identical originals
3617/// channel-by-channel (the inverse adds back the same per-block delta).
3618fn apply_forward_color(
3619 src: &[u32],
3620 dst: &mut [u32],
3621 width: u32,
3622 height: u32,
3623 color_image: &[u32],
3624 transform_width: u32,
3625 size_bits: u8,
3626) {
3627 if width == 0 || height == 0 {
3628 return;
3629 }
3630 let w = width as usize;
3631 let h = height as usize;
3632 for y in 0..h {
3633 for x in 0..w {
3634 let idx = y * w + x;
3635 let bx = (x as u32) >> size_bits;
3636 let by = (y as u32) >> size_bits;
3637 let block_index = (by * transform_width + bx) as usize;
3638 let cte = color_image[block_index];
3639 // §3.5.2 channel mapping: red=red_to_blue, green=green_to_blue,
3640 // blue=green_to_red.
3641 let red_to_blue = ((cte >> 16) & 0xff) as u8;
3642 let green_to_blue = ((cte >> 8) & 0xff) as u8;
3643 let green_to_red = (cte & 0xff) as u8;
3644
3645 let px = src[idx];
3646 let a = ((px >> 24) & 0xff) as u8;
3647 let r = ((px >> 16) & 0xff) as u8;
3648 let g = ((px >> 8) & 0xff) as u8;
3649 let b = (px & 0xff) as u8;
3650 let (new_r, new_b) =
3651 forward_color_pixel(r, g, b, green_to_red, green_to_blue, red_to_blue);
3652 dst[idx] =
3653 ((a as u32) << 24) | ((new_r as u32) << 16) | ((g as u32) << 8) | (new_b as u32);
3654 }
3655 }
3656}
3657
3658/// Default §3.5.2 `size_bits` value the encoder picks for the color
3659/// sub-image: `4` → 16×16 pixel blocks, matching
3660/// [`DEFAULT_PREDICTOR_SIZE_BITS`]. The spec admits `2..=9`
3661/// (`block` sizes 4..=512); finer blocks give better per-block CTE
3662/// fitting at the cost of a larger color sub-image. 16×16 is a
3663/// reasonable middle ground for the typical encoder workloads here.
3664const DEFAULT_COLOR_TRANSFORM_SIZE_BITS: u8 = 4;
3665
3666/// Encode `pixels` taking the §3.5.2 / §4.2 color-transform path: pick
3667/// a per-block `(green_to_red, green_to_blue, red_to_blue)` triple,
3668/// forward-transform the red and blue channels into the per-block
3669/// residuals, then encode the residuals via the standard
3670/// `spatially-coded-image` shape — wrapped by an `optional-transform`
3671/// whose first entry is the §4.2 color transform (header bit `%b1` +
3672/// transform type `Color = 1` + 3-bit `size_bits - 2` + the sub-
3673/// resolution color image as an `entropy-coded-image`).
3674///
3675/// The chooser composes with `cache_code_bits`: when `Some(bits)` a
3676/// §5.2.3 color cache of that size is built over the residual stream's
3677/// literal tokens.
3678///
3679/// **NB:** the color transform requires at least a `1 << size_bits`-
3680/// pixel side on both dimensions so the sub-resolution image has more
3681/// than one block; smaller images fall back to the no-transform
3682/// candidates.
3683fn encode_with_color_transform(
3684 pixels: &[u32],
3685 width: u32,
3686 height: u32,
3687 size_bits: u8,
3688 cache_code_bits: Option<u32>,
3689 image_width: u32,
3690) -> Vec<u8> {
3691 encode_with_color_transform_strategy(
3692 pixels,
3693 width,
3694 height,
3695 size_bits,
3696 cache_code_bits,
3697 image_width,
3698 ColorTransformStrategy::L1,
3699 )
3700}
3701
3702/// `size_bits` + `cache_code_bits` + per-block CTE [`ColorTransformStrategy`]
3703/// variant of [`encode_with_color_transform`]. The chooser sweeps both
3704/// strategies and keeps the byte-shortest stream (round 308), so the
3705/// entropy chooser cannot regress against the L1 baseline. Output is
3706/// round-trip-identical regardless of strategy: the cost model only
3707/// changes which per-block CTE is *recorded*, and the decoder's §4.2
3708/// inverse re-applies whatever CTE the sub-image carries.
3709fn encode_with_color_transform_strategy(
3710 pixels: &[u32],
3711 width: u32,
3712 height: u32,
3713 size_bits: u8,
3714 cache_code_bits: Option<u32>,
3715 image_width: u32,
3716 strategy: ColorTransformStrategy,
3717) -> Vec<u8> {
3718 let mut w = BitWriter::new();
3719
3720 // ---- §3.8.2 / §7.2 optional-transform: color-tx ----
3721 // present bit `%b1`.
3722 w.write_bit(true);
3723 // transform type `Color = 1`, 2 bits.
3724 w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
3725 // 3-bit `size_bits - 2` (decoder adds 2 back per §3.5.2).
3726 debug_assert!((2..=9).contains(&size_bits));
3727 w.write_bits((size_bits - 2) as u32, 3);
3728
3729 // Build the sub-resolution color image then write it as an
3730 // entropy-coded-image per §7.2 `color-image = 3BIT
3731 // entropy-coded-image`.
3732 let (color_image, tw, _th) = build_color_image(pixels, width, height, size_bits, strategy);
3733 write_entropy_coded_image_literals(&mut w, &color_image);
3734
3735 // End of optional-transform list (`%b0`).
3736 w.write_bit(false);
3737
3738 // ---- Forward-transform the main image ----
3739 let mut residuals = vec![0u32; pixels.len()];
3740 apply_forward_color(
3741 pixels,
3742 &mut residuals,
3743 width,
3744 height,
3745 &color_image,
3746 tw,
3747 size_bits,
3748 );
3749
3750 // ---- Tokenise + emit the residual spatially-coded-image ----
3751 let mut tokens = tokenize_lz77(&residuals);
3752 if let Some(bits) = cache_code_bits {
3753 tokens = cacheify_tokens(&tokens, &residuals, bits);
3754 }
3755 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3756
3757 w.into_bytes()
3758}
3759
3760// ---- §4.4 color-indexing transform encoder --------------------------
3761
3762/// §4.4 upper bound on the color-table size that triggers the
3763/// color-indexing transform: the spec describes the inverse with an
3764/// 8-bit on-wire `color_table_size = ReadBits(8) + 1`, so the legal
3765/// range is `1..=256` unique ARGB colors.
3766const MAX_PALETTE_SIZE: usize = 256;
3767
3768/// Scan `pixels` for unique ARGB values and, if the count is below
3769/// [`MAX_PALETTE_SIZE`], return a `(palette, index_of)` pair:
3770///
3771/// * `palette` — the unique ARGB values, sorted numerically. Sorting
3772/// maximises the per-component delta correlation the §4.4
3773/// subtraction-coded color table feeds to the entropy stage:
3774/// adjacent palette entries share similar ARGB bits, so the deltas
3775/// `palette[i] - palette[i-1]` (per-channel, mod 256) concentrate
3776/// near zero — the histogram shape Huffman codes shrink best.
3777///
3778/// * `index_of` — a lookup map from ARGB pixel value to its position
3779/// in `palette`, used by [`pack_indices_into_bundled_image`] to
3780/// replace each pixel with its index.
3781///
3782/// Returns `None` as soon as the unique-color count exceeds
3783/// [`MAX_PALETTE_SIZE`] (the §4.4 on-wire limit), so the early-exit
3784/// cost on photo-like images is bounded.
3785fn collect_palette(pixels: &[u32]) -> Option<(Vec<u32>, std::collections::HashMap<u32, u32>)> {
3786 use std::collections::HashSet;
3787 let mut set: HashSet<u32> = HashSet::new();
3788 for &p in pixels {
3789 set.insert(p);
3790 if set.len() > MAX_PALETTE_SIZE {
3791 return None;
3792 }
3793 }
3794 let mut palette: Vec<u32> = set.into_iter().collect();
3795 palette.sort_unstable();
3796 let mut map: std::collections::HashMap<u32, u32> =
3797 std::collections::HashMap::with_capacity(palette.len());
3798 for (i, &c) in palette.iter().enumerate() {
3799 map.insert(c, i as u32);
3800 }
3801 Some((palette, map))
3802}
3803
3804/// §4.4 *subtraction-encode* a color table in place — the inverse of
3805/// the decoder's [`crate::vp8l_transform::inverse_color_table`].
3806///
3807/// The decoder reconstructs `color_table[i] = color_table[i-1] +
3808/// color_table[i]` (per-channel mod 256), so the encoder emits
3809/// `color_table[i] - color_table[i-1]` (per-channel mod 256) for
3810/// `i >= 1`, leaving `color_table[0]` unchanged. Deltas walk
3811/// back-to-front so each cell still sees the original (pre-encoded)
3812/// previous value at the moment of subtraction.
3813fn forward_color_table(color_table: &mut [u32]) {
3814 if color_table.len() < 2 {
3815 return;
3816 }
3817 for i in (1..color_table.len()).rev() {
3818 let cur = color_table[i];
3819 let prev = color_table[i - 1];
3820 let a = ((cur >> 24) & 0xff).wrapping_sub((prev >> 24) & 0xff) & 0xff;
3821 let r = ((cur >> 16) & 0xff).wrapping_sub((prev >> 16) & 0xff) & 0xff;
3822 let g = ((cur >> 8) & 0xff).wrapping_sub((prev >> 8) & 0xff) & 0xff;
3823 let b = (cur & 0xff).wrapping_sub(prev & 0xff) & 0xff;
3824 color_table[i] = (a << 24) | (r << 16) | (g << 8) | b;
3825 }
3826}
3827
3828/// §4.4 *forward* pixel bundling: replace each ARGB pixel by its
3829/// palette `index`, packing 1/2/4/8 indices into one byte's-worth of
3830/// green channel per the §4.4 LSB-first packing rule. Other channels
3831/// are zeroed (alpha 0, red 0, blue 0) — the decoder reads only the
3832/// green channel via `inverse_color_indexing`.
3833///
3834/// `width_bits` is the value the shared §4.4 threshold table
3835/// [`crate::vp8l_transform::color_indexing_width_bits`] returns for
3836/// the palette size. `packed_width = DIV_ROUND_UP(width,
3837/// 1 << width_bits)` — the new image width fed to the §3 image
3838/// stream.
3839///
3840/// Returns the `packed_width * height` ARGB buffer the
3841/// `spatially-coded-image` writer feeds to the entropy stage. The
3842/// inverse `inverse_color_indexing` reconstructs the original
3843/// `width * height` ARGB image when given this buffer and the
3844/// (un-subtraction-encoded) palette.
3845fn pack_indices_into_bundled_image(
3846 pixels: &[u32],
3847 index_of: &std::collections::HashMap<u32, u32>,
3848 width: u32,
3849 height: u32,
3850 width_bits: u8,
3851) -> (Vec<u32>, u32) {
3852 let count = 1u32 << width_bits;
3853 let bits_per_index = if width_bits == 0 { 8 } else { 8 / count };
3854 let packed_width = width.div_ceil(count);
3855 let pw = packed_width as usize;
3856 let w = width as usize;
3857 let h = height as usize;
3858 let mut out = vec![0u32; pw * h];
3859 for y in 0..h {
3860 for x in 0..w {
3861 let idx = *index_of
3862 .get(&pixels[y * w + x])
3863 .expect("collect_palette covered every pixel");
3864 let packed_x = x / count as usize;
3865 let sub = x % count as usize;
3866 let shift = sub * bits_per_index as usize;
3867 let bits = (idx & ((1u32 << bits_per_index) - 1)) << shift;
3868 out[y * pw + packed_x] |= bits << 8; // pack into the green channel.
3869 }
3870 }
3871 (out, packed_width)
3872}
3873
3874/// Encode `pixels` taking the §4.4 color-indexing transform path:
3875/// build the unique-color palette, replace every pixel with its
3876/// palette index (bundled per the §4.4 `width_bits` rule when the
3877/// palette has ≤16 entries), then emit the bundled-width image via
3878/// the standard `spatially-coded-image` shape — wrapped by an
3879/// `optional-transform` whose first entry is the §4.4 color-indexing
3880/// transform.
3881///
3882/// Wire format produced (§3.8.2 / §7.2 grammar):
3883///
3884/// ```text
3885/// optional-transform =
3886/// %b1 -- transform present
3887/// %b11 -- type ColorIndexing = 3
3888/// 8BIT -- color_table_size - 1
3889/// entropy-coded-image -- the subtraction-encoded palette,
3890/// written at width = color_table_size,
3891/// height = 1
3892/// %b0 -- end of optional-transform list
3893/// spatially-coded-image -- packed indices at packed_width
3894/// ```
3895///
3896/// Returns `None` when the palette size exceeds [`MAX_PALETTE_SIZE`]
3897/// (the §4.4 on-wire limit), so the chooser can skip this candidate
3898/// in O(N) on photo-like content. The chooser composes with
3899/// `cache_code_bits`: when `Some(bits)` a §5.2.3 color cache of that
3900/// size is built over the packed-index stream's literal tokens.
3901fn encode_with_color_indexing(
3902 pixels: &[u32],
3903 width: u32,
3904 height: u32,
3905 cache_code_bits: Option<u32>,
3906) -> Option<Vec<u8>> {
3907 let (palette, index_of) = collect_palette(pixels)?;
3908 if palette.is_empty() {
3909 return None;
3910 }
3911
3912 // §4.4 threshold table — single shared copy.
3913 let width_bits = crate::vp8l_transform::color_indexing_width_bits(palette.len());
3914 let (packed_image, packed_width) =
3915 pack_indices_into_bundled_image(pixels, &index_of, width, height, width_bits);
3916
3917 let mut w = BitWriter::new();
3918
3919 // ---- §3.8.2 / §7.2 optional-transform: color-indexing-tx ----
3920 // Header bit `%b1` (transform present).
3921 w.write_bit(true);
3922 // Transform type `ColorIndexing = 3` (2 bits, LSB-first → value 3
3923 // matches the spec's `%b11` MSB-first ABNF when read through
3924 // `ReadBits(2)`).
3925 w.write_bits(crate::vp8l_stream::TransformType::ColorIndexing as u32, 2);
3926 // 8-bit `color_table_size - 1` (decoder adds 1 back per §4.4).
3927 debug_assert!((1..=MAX_PALETTE_SIZE).contains(&palette.len()));
3928 w.write_bits((palette.len() - 1) as u32, 8);
3929
3930 // Color table = an entropy-coded-image at width = color_table_size,
3931 // height = 1. The on-wire palette is subtraction-encoded; the
3932 // decoder applies `inverse_color_table` to reverse it.
3933 let mut subtraction_encoded = palette.clone();
3934 forward_color_table(&mut subtraction_encoded);
3935 write_entropy_coded_image_literals(&mut w, &subtraction_encoded);
3936
3937 // End of optional-transform list (`%b0`).
3938 w.write_bit(false);
3939
3940 // ---- Spatially-coded-image at the *subsampled* width ------------
3941 // After §4.4, `image_width` is `DIV_ROUND_UP(width, 1 <<
3942 // width_bits)`; that is the width the entropy stage threads
3943 // through the §5.2.2 distance-code chooser. Pixel values are the
3944 // packed-green-channel bytes whose red/blue/alpha channels are
3945 // identically zero, so the per-channel Huffman codes for those
3946 // three channels collapse to a 1-symbol prefix code each (almost
3947 // free header overhead).
3948 let mut tokens = tokenize_lz77(&packed_image);
3949 if let Some(bits) = cache_code_bits {
3950 tokens = cacheify_tokens(&tokens, &packed_image, bits);
3951 }
3952 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, packed_width);
3953
3954 Some(w.into_bytes())
3955}
3956
3957/// Encode `pixels` with the §4.4 color-indexing transform **chained**
3958/// with the §4.1 spatial predictor transform on the bundled-index
3959/// image.
3960///
3961/// RFC 9649 §3.5 allows up to four transforms to be stacked in one
3962/// `optional-transform` list (each used at most once); the inverse
3963/// transforms are applied "in the reverse order that they are read
3964/// from the bitstream, that is, last one first." The bundled palette
3965/// indices the §4.4 transform produces live entirely in the green
3966/// channel and run in long spatially-coherent stretches on palette
3967/// content (icons, line art, screen captures); a §4.1 predictor pass
3968/// over that bundled image turns those runs into near-zero residuals,
3969/// shrinking the entropy stage further than either transform alone.
3970///
3971/// ## Wire / inverse ordering
3972///
3973/// The two transforms are written **color-indexing first, predictor
3974/// second**:
3975///
3976/// ```text
3977/// optional-transform =
3978/// %b1 %b11 8BIT entropy-coded-image -- §4.4 color-indexing-tx (palette)
3979/// %b1 %b00 3BIT entropy-coded-image -- §4.1 predictor-tx (sub-image)
3980/// %b0 -- end of optional-transform list
3981/// spatially-coded-image -- predictor residuals over the
3982/// packed indices, at packed_width
3983/// ```
3984///
3985/// The decoder reads color-indexing first, which subsamples the width
3986/// it threads into the predictor body (`transform_width =
3987/// DIV_ROUND_UP(packed_width, block)`) and into the main image; it then
3988/// applies the inverses in reverse read order — inverse-predictor over
3989/// the packed-index image first (recovering the bundled indices), then
3990/// inverse-color-indexing (un-bundling back to the full-width ARGB
3991/// pixels). This is exactly the order
3992/// [`crate::vp8l_transform::decode_lossless`] already implements for a
3993/// stacked list, so no decoder change is required.
3994///
3995/// The predictor sub-image is built over the **packed** image at
3996/// `packed_width × height`; the predictor's modes therefore decorrelate
3997/// adjacent bundled-index bytes, not the original pixels.
3998///
3999/// Returns `None` when the palette is infeasible (`> MAX_PALETTE_SIZE`
4000/// unique colors) or the packed image is too small for the predictor
4001/// transform to carry a meaningful body (needs at least one full
4002/// `block × block` square, i.e. `packed_width >= block && height >=
4003/// block`). In those cases the single-transform color-indexing
4004/// candidate already covers the input.
4005///
4006/// The chooser composes with `cache_code_bits` over the residual
4007/// stream's literal tokens, identically to the single-transform paths.
4008fn encode_with_color_indexing_predictor(
4009 pixels: &[u32],
4010 width: u32,
4011 height: u32,
4012 size_bits: u8,
4013 cache_code_bits: Option<u32>,
4014 pred_strategy: PredictorSubImageStrategy,
4015) -> Option<Vec<u8>> {
4016 let (palette, index_of) = collect_palette(pixels)?;
4017 if palette.is_empty() {
4018 return None;
4019 }
4020
4021 // §4.4 bundle the indices into the green channel at the subsampled
4022 // width — the same step the single-transform color-indexing path
4023 // takes.
4024 let width_bits = crate::vp8l_transform::color_indexing_width_bits(palette.len());
4025 let (packed_image, packed_width) =
4026 pack_indices_into_bundled_image(pixels, &index_of, width, height, width_bits);
4027
4028 // The §4.1 predictor needs at least one full block square at the
4029 // packed width; otherwise its sub-image is pure overhead and the
4030 // single-transform color-indexing candidate is strictly cheaper.
4031 let block = 1u32 << size_bits;
4032 if packed_width < block || height < block {
4033 return None;
4034 }
4035
4036 let mut w = BitWriter::new();
4037
4038 // ---- Transform #1 (read first): §4.4 color-indexing-tx ----------
4039 w.write_bit(true);
4040 w.write_bits(crate::vp8l_stream::TransformType::ColorIndexing as u32, 2);
4041 debug_assert!((1..=MAX_PALETTE_SIZE).contains(&palette.len()));
4042 w.write_bits((palette.len() - 1) as u32, 8);
4043 let mut subtraction_encoded = palette.clone();
4044 forward_color_table(&mut subtraction_encoded);
4045 write_entropy_coded_image_literals(&mut w, &subtraction_encoded);
4046
4047 // ---- Transform #2 (read second): §4.1 predictor-tx --------------
4048 // Built over the *packed* index image at `packed_width × height`.
4049 // The decoder will have subsampled `current_width` to `packed_width`
4050 // after reading transform #1, so the `transform_width` it derives
4051 // for this body — `DIV_ROUND_UP(packed_width, block)` — matches the
4052 // `tw` produced here.
4053 w.write_bit(true);
4054 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4055 debug_assert!((2..=9).contains(&size_bits));
4056 w.write_bits((size_bits - 2) as u32, 3);
4057 // Round 305: build the predictor sub-image over the *packed index*
4058 // image under `pred_strategy`. The chooser sweeps L1 / entropy /
4059 // sub-image-aware and keeps the byte-shortest.
4060 let (predictor_image, tw, _th) = build_predictor_image_strategy(
4061 &packed_image,
4062 packed_width,
4063 height,
4064 size_bits,
4065 pred_strategy,
4066 );
4067 write_entropy_coded_image_literals(&mut w, &predictor_image);
4068
4069 // End of optional-transform list (`%b0`).
4070 w.write_bit(false);
4071
4072 // ---- Forward-transform the packed image into residuals ----------
4073 let mut residuals = vec![0u32; packed_image.len()];
4074 apply_forward_predictor(
4075 &packed_image,
4076 &mut residuals,
4077 packed_width,
4078 height,
4079 &predictor_image,
4080 tw,
4081 size_bits,
4082 );
4083
4084 // ---- Spatially-coded-image of the residuals at packed_width -----
4085 let mut tokens = tokenize_lz77(&residuals);
4086 if let Some(bits) = cache_code_bits {
4087 tokens = cacheify_tokens(&tokens, &residuals, bits);
4088 }
4089 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, packed_width);
4090
4091 Some(w.into_bytes())
4092}
4093
4094/// Encode `pixels` with the §4.2 cross-color transform **chained** with
4095/// the §4.1 spatial predictor transform, the stacked pair the spec
4096/// targets at photo / natural-image content.
4097///
4098/// RFC 9649 §3.5 allows up to four transforms to be stacked in one
4099/// `optional-transform` list (each used at most once); the inverse
4100/// transforms are applied "in the reverse order that they are read from
4101/// the bitstream, that is, last one first." On photo content the §4.2
4102/// color transform first removes the inter-channel correlation (rewriting
4103/// red and blue as residuals against green per the per-block
4104/// `ColorTransformElement`); a §4.1 spatial-predictor pass over the
4105/// color-decorrelated image then removes the *spatial* correlation that
4106/// survives in each channel, driving the residuals the entropy stage sees
4107/// closer to zero than either transform alone.
4108///
4109/// ## Wire / inverse ordering
4110///
4111/// The two transforms are written **color-transform first, predictor
4112/// second**:
4113///
4114/// ```text
4115/// optional-transform =
4116/// %b1 %b01 3BIT entropy-coded-image -- §4.2 color-tx (color sub-image)
4117/// %b1 %b00 3BIT entropy-coded-image -- §4.1 predictor-tx (sub-image)
4118/// %b0 -- end of optional-transform list
4119/// spatially-coded-image -- predictor residuals over the
4120/// color-transformed image
4121/// ```
4122///
4123/// Neither transform subsamples the width, so both sub-image bodies and
4124/// the main image run at the full canvas `width`. The decoder reads
4125/// color-transform first, predictor second, then applies the inverses in
4126/// reverse read order — inverse-predictor first (recovering the
4127/// color-transformed image), then inverse-color (recovering the original
4128/// ARGB pixels). This is exactly the order
4129/// [`crate::vp8l_transform::decode_lossless`] already implements for a
4130/// stacked list, so no decoder change is required.
4131///
4132/// The predictor sub-image is built over the **color-transformed** image,
4133/// so its per-block modes decorrelate the color residuals, not the raw
4134/// pixels.
4135///
4136/// `size_bits` is shared by both transforms (each writes its own 3-bit
4137/// `size_bits - 2` header). The caller is responsible for gating on
4138/// `width >= block && height >= block` so both sub-images carry at least
4139/// one full block square; the chooser does this before calling.
4140fn encode_with_color_transform_predictor(
4141 pixels: &[u32],
4142 width: u32,
4143 height: u32,
4144 size_bits: u8,
4145 cache_code_bits: Option<u32>,
4146 pred_strategy: PredictorSubImageStrategy,
4147) -> Vec<u8> {
4148 let mut w = BitWriter::new();
4149
4150 // ---- Transform #1 (read first): §4.2 color-tx -------------------
4151 w.write_bit(true);
4152 w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
4153 debug_assert!((2..=9).contains(&size_bits));
4154 w.write_bits((size_bits - 2) as u32, 3);
4155 let (color_image, ctw, _cth) =
4156 build_color_image(pixels, width, height, size_bits, ColorTransformStrategy::L1);
4157 write_entropy_coded_image_literals(&mut w, &color_image);
4158
4159 // Forward the §4.2 color transform over the originals so the
4160 // predictor below sees the color-decorrelated image.
4161 let mut color_transformed = vec![0u32; pixels.len()];
4162 apply_forward_color(
4163 pixels,
4164 &mut color_transformed,
4165 width,
4166 height,
4167 &color_image,
4168 ctw,
4169 size_bits,
4170 );
4171
4172 // ---- Transform #2 (read second): §4.1 predictor-tx --------------
4173 // Built over the color-transformed image at full `width × height`.
4174 // The decoder will have left `current_width` at `width` after the
4175 // color transform (color-tx does not subsample), so the
4176 // `transform_width` it derives matches `ptw` here. The predictor
4177 // sub-image is built under `pred_strategy` (round 305) — the
4178 // chooser sweeps L1 / entropy / sub-image-aware over this
4179 // color-decorrelated residual and keeps the byte-shortest.
4180 w.write_bit(true);
4181 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4182 w.write_bits((size_bits - 2) as u32, 3);
4183 let (predictor_image, ptw, _pth) =
4184 build_predictor_image_strategy(&color_transformed, width, height, size_bits, pred_strategy);
4185 write_entropy_coded_image_literals(&mut w, &predictor_image);
4186
4187 // End of optional-transform list (`%b0`).
4188 w.write_bit(false);
4189
4190 // ---- Forward-transform the color-transformed image into residuals
4191 let mut residuals = vec![0u32; color_transformed.len()];
4192 apply_forward_predictor(
4193 &color_transformed,
4194 &mut residuals,
4195 width,
4196 height,
4197 &predictor_image,
4198 ptw,
4199 size_bits,
4200 );
4201
4202 // ---- Spatially-coded-image of the residuals at full width -------
4203 let mut tokens = tokenize_lz77(&residuals);
4204 if let Some(bits) = cache_code_bits {
4205 tokens = cacheify_tokens(&tokens, &residuals, bits);
4206 }
4207 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, width);
4208
4209 w.into_bytes()
4210}
4211
4212/// Encode `pixels` with a **three-transform** §3.5 stack: §4.2 cross-color
4213/// (read first) → §4.3 subtract-green (read second) → §4.1 spatial
4214/// predictor (read third), chained over one `optional-transform` list.
4215///
4216/// RFC 9649 §3.5 permits up to four transforms stacked in one list, each
4217/// used at most once, with the inverses applied "in the reverse order that
4218/// they are read from the bitstream, that is, last one first." This
4219/// candidate is the natural three-axis extension of the round-303
4220/// color-transform + predictor pair: after the §4.2 per-block color
4221/// transform has removed the *modeled* inter-channel correlation (rewriting
4222/// red / blue as residuals against green per the per-block
4223/// `ColorTransformElement`), a header-free §4.3 subtract-green pass removes
4224/// the *uniform* red/blue-vs-green correlation that survives the per-block
4225/// model (the CTE multipliers are coarse 3.5-bit fixed-point values, so a
4226/// residual green-correlated component routinely remains), and a §4.1
4227/// predictor pass then removes the spatial correlation left in each channel.
4228/// The entropy stage therefore sees residuals driven closer to zero than any
4229/// one- or two-transform path achieves alone, on content where all three
4230/// correlation axes carry mass.
4231///
4232/// ## Wire / inverse ordering
4233///
4234/// ```text
4235/// optional-transform =
4236/// %b1 %b01 3BIT entropy-coded-image -- §4.2 color-tx (color sub-image)
4237/// %b1 %b10 -- §4.3 subtract-green (no data)
4238/// %b1 %b00 3BIT entropy-coded-image -- §4.1 predictor-tx (sub-image)
4239/// %b0 -- end of optional-transform list
4240/// spatially-coded-image -- predictor residuals over the
4241/// color- + subtract-green-
4242/// transformed image
4243/// ```
4244///
4245/// The §4.3 subtract-green transform carries **no** transform data (just
4246/// the `%b1 %b10` presence + type bits), exactly as §4.3 specifies. None of
4247/// the three transforms subsamples the width, so both sub-image bodies and
4248/// the main image run at the full canvas `width`. The decoder reads color
4249/// first, subtract-green second, predictor third, then applies the inverses
4250/// last-read-first — inverse-predictor (recovering the color- +
4251/// subtract-green-transformed image), inverse-subtract-green (recovering the
4252/// color-transformed image), inverse-color (recovering the originals). This
4253/// is exactly the generic reverse-read-order chain
4254/// [`crate::vp8l_transform::decode_lossless`] already applies, so no decoder
4255/// change is required.
4256///
4257/// The predictor sub-image is built over the **color- + subtract-green-
4258/// transformed** image, so its per-block modes decorrelate that residual,
4259/// not the raw pixels.
4260///
4261/// `size_bits` is shared by the §4.2 color and §4.1 predictor transforms
4262/// (the §4.3 subtract-green transform has no `size_bits`); each writes its
4263/// own 3-bit `size_bits - 2` header. The caller gates on
4264/// `width >= block && height >= block` so both sub-images carry at least one
4265/// full block square.
4266fn encode_with_color_transform_subtract_green_predictor(
4267 pixels: &[u32],
4268 width: u32,
4269 height: u32,
4270 size_bits: u8,
4271 cache_code_bits: Option<u32>,
4272 pred_strategy: PredictorSubImageStrategy,
4273) -> Vec<u8> {
4274 let mut w = BitWriter::new();
4275
4276 // ---- Transform #1 (read first): §4.2 color-tx -------------------
4277 w.write_bit(true);
4278 w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
4279 debug_assert!((2..=9).contains(&size_bits));
4280 w.write_bits((size_bits - 2) as u32, 3);
4281 let (color_image, ctw, _cth) =
4282 build_color_image(pixels, width, height, size_bits, ColorTransformStrategy::L1);
4283 write_entropy_coded_image_literals(&mut w, &color_image);
4284
4285 // Forward the §4.2 color transform over the originals.
4286 let mut transformed = vec![0u32; pixels.len()];
4287 apply_forward_color(
4288 pixels,
4289 &mut transformed,
4290 width,
4291 height,
4292 &color_image,
4293 ctw,
4294 size_bits,
4295 );
4296
4297 // ---- Transform #2 (read second): §4.3 subtract-green ------------
4298 // Header-free: just the presence bit + the 2-bit type. The forward
4299 // pass rewrites red/blue against green in place over the
4300 // color-transformed image.
4301 w.write_bit(true);
4302 w.write_bits(crate::vp8l_stream::TransformType::SubtractGreen as u32, 2);
4303 apply_subtract_green(&mut transformed);
4304
4305 // ---- Transform #3 (read third): §4.1 predictor-tx --------------
4306 // Built over the color- + subtract-green-transformed image at full
4307 // `width × height`. Neither earlier transform subsampled the width,
4308 // so the decoder still has `current_width == width` here and the
4309 // `transform_width` it derives matches `ptw`. The predictor
4310 // sub-image is built under `pred_strategy` (round 305).
4311 w.write_bit(true);
4312 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4313 w.write_bits((size_bits - 2) as u32, 3);
4314 let (predictor_image, ptw, _pth) =
4315 build_predictor_image_strategy(&transformed, width, height, size_bits, pred_strategy);
4316 write_entropy_coded_image_literals(&mut w, &predictor_image);
4317
4318 // End of optional-transform list (`%b0`).
4319 w.write_bit(false);
4320
4321 // ---- Forward-transform the transformed image into residuals -----
4322 let mut residuals = vec![0u32; transformed.len()];
4323 apply_forward_predictor(
4324 &transformed,
4325 &mut residuals,
4326 width,
4327 height,
4328 &predictor_image,
4329 ptw,
4330 size_bits,
4331 );
4332
4333 // ---- Spatially-coded-image of the residuals at full width -------
4334 let mut tokens = tokenize_lz77(&residuals);
4335 if let Some(bits) = cache_code_bits {
4336 tokens = cacheify_tokens(&tokens, &residuals, bits);
4337 }
4338 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, width);
4339
4340 w.into_bytes()
4341}
4342
4343// ---- §6.2.2 multi-meta-prefix (entropy-image) encoder ----------------
4344
4345/// Default `prefix_bits` candidate the §6.2.2 multi-meta-prefix
4346/// chooser sweeps. Each value gives a block side of `1 << prefix_bits`
4347/// pixels — larger blocks mean fewer of them (cheap entropy image,
4348/// fewer prefix-code groups) but coarser per-region adaptation; smaller
4349/// blocks mean finer adaptation but a larger entropy-image overhead.
4350/// The sweep across `[4, 5, 6, 7]` gives 16/32/64/128-pixel blocks,
4351/// which span the useful range for the dimensions this crate targets
4352/// (typical lossless WebP fixtures are 16..512 pixels per side).
4353///
4354/// The spec admits `prefix_bits ∈ [2..9]` (i.e. 4..512-pixel blocks);
4355/// the chooser narrows that to four values rather than the full eight
4356/// because the very smallest (4-pixel) blocks rarely beat the
4357/// single-group baseline (the entropy image grows quadratically with
4358/// `1 / block_side`) and the largest (256/512-pixel) blocks are
4359/// useless on the smaller images this candidate targets.
4360const META_PREFIX_BITS_SWEEP: [u8; 4] = [4, 5, 6, 7];
4361
4362/// Largest number of prefix-code groups the §6.2.2 chooser will form.
4363/// Each group costs five additional code-length tables in the stream
4364/// header (~30..120 bits per code), so the chooser only pays the
4365/// overhead when the per-group savings on the LZ77 stream beat the
4366/// header cost. Capping at 4 keeps the chooser's wall-time bounded
4367/// while covering the per-region adaptation that pays for itself on
4368/// natural images (where the per-quadrant statistics diverge enough to
4369/// justify separate codes).
4370const MAX_META_GROUPS: u32 = 4;
4371
4372// ---- §6.2.2 histogram-distance block clusterer -------------------------
4373//
4374// Spec context (RFC 9649 §3.7.2.2 / WebP Lossless §6.2.2): the §5.2 LZ77
4375// + prefix-code-group decoder selects one of `num_prefix_groups` groups
4376// per pixel block. The encoder gets to choose how to *partition* the
4377// image's blocks into groups — the spec only constrains the on-wire
4378// representation (an `entropy-coded-image` whose green+red channels
4379// carry the per-block meta-prefix code).
4380//
4381// The right partition collects blocks whose alphabet-symbol histograms
4382// (green, red, blue, alpha + LZ77 length / distance) match closely, so
4383// each group's shared §6.2 prefix code can compact those symbols
4384// efficiently. A direct symbol-histogram clusterer would have to
4385// pre-tokenise to see which symbols each block produces, which puts a
4386// hard constraint on the matcher (`tokenize_lz77` runs *after* the
4387// clusterer here). We use a pixel-domain proxy instead: a coarse
4388// per-channel RGB histogram. Blocks whose pixel-value distributions
4389// agree at bin resolution will, in expectation, produce closely-matched
4390// literal-symbol frequencies, which is exactly what drives §6.2's
4391// per-group code cost.
4392
4393/// Bin shift collapsing the 256-value channel range into a coarser
4394/// histogram for clustering. `BIN_SHIFT = 4` → 16 bins per channel.
4395///
4396/// The smaller the shift the finer the discrimination but the more
4397/// per-block memory + per-iteration arithmetic; 4 keeps the per-block
4398/// feature vector at 48 `u32` slots (16 × 3 channels) which is small
4399/// enough to scan repeatedly in Lloyd's iteration but large enough to
4400/// distinguish meaningfully different per-region distributions on
4401/// natural-image inputs.
4402const CLUSTER_BIN_SHIFT: u32 = 4;
4403/// Number of histogram bins per channel after [`CLUSTER_BIN_SHIFT`]:
4404/// `256 >> CLUSTER_BIN_SHIFT`.
4405const CLUSTER_BINS_PER_CHANNEL: usize = 256 >> CLUSTER_BIN_SHIFT;
4406/// Channels included in the feature vector. We histogram red / green /
4407/// blue; alpha is omitted because most lossless WebP payloads carry an
4408/// opaque alpha and a uniform-`0xff` alpha bin contributes no signal.
4409const CLUSTER_NUM_CHANNELS: usize = 3;
4410/// Length of one block's feature vector: `bins-per-channel × channels`.
4411const CLUSTER_FEATURE_DIM: usize = CLUSTER_BINS_PER_CHANNEL * CLUSTER_NUM_CHANNELS;
4412
4413/// Maximum Lloyd's-algorithm iteration count. On the diagnostic
4414/// fixtures the assignment settles in 2–3 passes; the cap bounds the
4415/// chooser's wall-time on pathological inputs (the outer chooser will
4416/// often discard this candidate anyway).
4417const CLUSTER_MAX_ITERATIONS: u32 = 8;
4418
4419/// Build the per-block coarse RGB histogram feature vectors.
4420///
4421/// The feature layout per block is three contiguous channel chunks:
4422/// red bins, then green bins, then blue bins, each of length
4423/// [`CLUSTER_BINS_PER_CHANNEL`]. Counts are left raw (not normalised)
4424/// because all blocks of the same `block_side` see the same pixel
4425/// count, so L1 distance between any two block vectors is directly
4426/// comparable. Boundary blocks (where `block_side` doesn't divide
4427/// `width` / `height` evenly) have smaller pixel counts, so their
4428/// vector magnitudes are correspondingly smaller — the L1 metric
4429/// stays meaningful because both sides of every comparison are
4430/// pulled from the same fixed-size bin grid.
4431fn histogram_block_features(
4432 pixels: &[u32],
4433 width: u32,
4434 height: u32,
4435 prefix_bits: u8,
4436) -> (Vec<u32>, usize) {
4437 let block_side = 1u32 << prefix_bits;
4438 let blocks_wide = width.div_ceil(block_side) as usize;
4439 let blocks_high = height.div_ceil(block_side) as usize;
4440 let block_count = blocks_wide * blocks_high;
4441 let mut features = vec![0u32; block_count * CLUSTER_FEATURE_DIM];
4442
4443 let row_stride = width as usize;
4444 let bs = block_side as usize;
4445 for y in 0..height as usize {
4446 let block_row = y / bs;
4447 for x in 0..width as usize {
4448 let block_col = x / bs;
4449 let block_index = block_row * blocks_wide + block_col;
4450 let pixel = pixels[y * row_stride + x];
4451 let r_bin = (((pixel >> 16) & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4452 let g_bin = (((pixel >> 8) & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4453 let b_bin = ((pixel & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4454 let base = block_index * CLUSTER_FEATURE_DIM;
4455 features[base + r_bin] += 1;
4456 features[base + CLUSTER_BINS_PER_CHANNEL + g_bin] += 1;
4457 features[base + 2 * CLUSTER_BINS_PER_CHANNEL + b_bin] += 1;
4458 }
4459 }
4460 (features, block_count)
4461}
4462
4463/// L1 (sum-of-absolute-differences) distance between two
4464/// `CLUSTER_FEATURE_DIM`-length count vectors. Symmetric and integer-
4465/// valued; zero iff every bin matches exactly.
4466fn histogram_l1(a: &[u32], b: &[u32]) -> u64 {
4467 debug_assert_eq!(a.len(), CLUSTER_FEATURE_DIM);
4468 debug_assert_eq!(b.len(), CLUSTER_FEATURE_DIM);
4469 let mut sum: u64 = 0;
4470 for i in 0..CLUSTER_FEATURE_DIM {
4471 let ai = a[i];
4472 let bi = b[i];
4473 sum += ai.abs_diff(bi) as u64;
4474 }
4475 sum
4476}
4477
4478/// Deterministic centroid seeding by farthest-from-already-chosen rule
4479/// (a k-means++-style maximum-minimum-distance variant with no
4480/// randomness so identical inputs always produce identical seeds).
4481///
4482/// Starts with block 0 as the first centroid, then repeatedly picks
4483/// the block whose minimum L1 distance to the already-chosen set is
4484/// the largest. Returns the chosen block indices. If at some step no
4485/// remaining block has positive distance to every chosen centroid
4486/// (i.e. it duplicates one already in the set), the seeding stops
4487/// early — the caller treats a list shorter than `num_groups` as a
4488/// signal that the input cannot be split that finely.
4489fn seed_cluster_centroids(features: &[u32], block_count: usize, num_groups: u32) -> Vec<usize> {
4490 let target = num_groups as usize;
4491 debug_assert!(target >= 1 && target <= block_count);
4492 let mut picks: Vec<usize> = Vec::with_capacity(target);
4493 picks.push(0);
4494 while picks.len() < target {
4495 let mut champion_block = 0usize;
4496 let mut champion_min_dist: u64 = 0;
4497 for cand in 0..block_count {
4498 if picks.contains(&cand) {
4499 continue;
4500 }
4501 let cand_vec = &features[cand * CLUSTER_FEATURE_DIM..(cand + 1) * CLUSTER_FEATURE_DIM];
4502 let mut nearest: u64 = u64::MAX;
4503 for &p in &picks {
4504 let pick_vec = &features[p * CLUSTER_FEATURE_DIM..(p + 1) * CLUSTER_FEATURE_DIM];
4505 let d = histogram_l1(cand_vec, pick_vec);
4506 if d < nearest {
4507 nearest = d;
4508 }
4509 }
4510 if nearest > champion_min_dist {
4511 champion_min_dist = nearest;
4512 champion_block = cand;
4513 }
4514 }
4515 if champion_min_dist == 0 {
4516 // No more distinguishable centroids remain.
4517 break;
4518 }
4519 picks.push(champion_block);
4520 }
4521 picks
4522}
4523
4524/// Partition the image's `prefix_bits`-aligned blocks into at most
4525/// `num_groups` clusters by coarse-RGB-histogram L1 distance, returning
4526/// one meta-prefix code per block in scan-line order.
4527///
4528/// The returned codes are always *compact*: they form the contiguous
4529/// range `0..actual_groups - 1` with no gaps. Per RFC 9649 §3.7.2.2.2
4530/// the entropy image's `num_prefix_groups` is derived as
4531/// `max(entropy image) + 1`, so a gap (an empty group sitting between
4532/// used ones) would force the encoder to emit an unused prefix-code
4533/// group and pay its code-length-table cost for no benefit.
4534///
4535/// Returns `vec![0; block_count]` (a single-group degenerate) when:
4536///
4537/// * `num_groups == 1` (caller asked for one group),
4538/// * `block_count <= 1` (the entropy image holds at most one block, so
4539/// there is no partition to make),
4540/// * seeding cannot find `≥ 2` distinguishable centroids (e.g. all
4541/// blocks have identical histograms), or
4542/// * Lloyd's iteration converges to a single non-empty cluster after
4543/// the compaction pass.
4544///
4545/// The caller's chooser uses the degenerate path as a signal to fall
4546/// through to the single-group baseline rather than paying the
4547/// multi-group meta-prefix header overhead.
4548///
4549/// **Determinism.** Two calls with the same `(pixels, width, height,
4550/// prefix_bits, num_groups)` always produce the same `Vec<u16>` — the
4551/// seeding rule, the Lloyd loop's tie-break (lowest-index centroid
4552/// wins on equal-distance), and the compaction pass are all
4553/// deterministic.
4554///
4555/// Exposed `pub` (like [`pick_block_cte`]) so the `meta_prefix_cluster`
4556/// criterion bench can drive this §6.2.2 entropy-image kernel in
4557/// isolation. Not part of the crate's documented stable surface.
4558pub fn cluster_blocks_by_histogram_distance(
4559 pixels: &[u32],
4560 width: u32,
4561 height: u32,
4562 prefix_bits: u8,
4563 num_groups: u32,
4564) -> Vec<u16> {
4565 debug_assert!(num_groups >= 1);
4566 let (features, block_count) = histogram_block_features(pixels, width, height, prefix_bits);
4567 if num_groups == 1 || block_count <= 1 {
4568 return vec![0u16; block_count];
4569 }
4570
4571 let seeds = seed_cluster_centroids(&features, block_count, num_groups);
4572 if seeds.len() < 2 {
4573 return vec![0u16; block_count];
4574 }
4575 let cluster_k = seeds.len();
4576
4577 // Centroids are stored as running sums of assigned-block feature
4578 // vectors so the update step amortises the per-bin sum across all
4579 // assigned blocks in O(block_count × feat_dim). The per-cluster
4580 // assignment count divides the sum on demand to materialise the
4581 // average for the L1 step.
4582 let mut centroid_sums: Vec<u64> = vec![0u64; cluster_k * CLUSTER_FEATURE_DIM];
4583 let mut centroid_counts: Vec<u64> = vec![1u64; cluster_k];
4584 for (slot, &block_idx) in seeds.iter().enumerate() {
4585 let src = &features[block_idx * CLUSTER_FEATURE_DIM..(block_idx + 1) * CLUSTER_FEATURE_DIM];
4586 for (i, &v) in src.iter().enumerate() {
4587 centroid_sums[slot * CLUSTER_FEATURE_DIM + i] = v as u64;
4588 }
4589 }
4590
4591 let mut assignment: Vec<u16> = vec![0u16; block_count];
4592 let mut centroid_view: Vec<u32> = vec![0u32; CLUSTER_FEATURE_DIM];
4593
4594 for _pass in 0..CLUSTER_MAX_ITERATIONS {
4595 // Assignment step: reassign each block to the nearest centroid.
4596 let mut any_change = false;
4597 for b in 0..block_count {
4598 let block_vec = &features[b * CLUSTER_FEATURE_DIM..(b + 1) * CLUSTER_FEATURE_DIM];
4599 let mut best_group: u16 = 0;
4600 let mut best_dist: u64 = u64::MAX;
4601 for ci in 0..cluster_k {
4602 let divisor = centroid_counts[ci].max(1);
4603 for i in 0..CLUSTER_FEATURE_DIM {
4604 let raw = centroid_sums[ci * CLUSTER_FEATURE_DIM + i];
4605 centroid_view[i] = (raw / divisor) as u32;
4606 }
4607 let d = histogram_l1(block_vec, ¢roid_view);
4608 if d < best_dist {
4609 best_dist = d;
4610 best_group = ci as u16;
4611 }
4612 }
4613 if assignment[b] != best_group {
4614 assignment[b] = best_group;
4615 any_change = true;
4616 }
4617 }
4618 if !any_change {
4619 break;
4620 }
4621
4622 // Update step: rebuild centroid sums + counts from the new
4623 // assignment.
4624 for slot in centroid_sums.iter_mut() {
4625 *slot = 0;
4626 }
4627 for slot in centroid_counts.iter_mut() {
4628 *slot = 0;
4629 }
4630 for b in 0..block_count {
4631 let ci = assignment[b] as usize;
4632 let block_vec = &features[b * CLUSTER_FEATURE_DIM..(b + 1) * CLUSTER_FEATURE_DIM];
4633 let base = ci * CLUSTER_FEATURE_DIM;
4634 for (i, &v) in block_vec.iter().enumerate() {
4635 centroid_sums[base + i] += v as u64;
4636 }
4637 centroid_counts[ci] += 1;
4638 }
4639 }
4640
4641 // Compaction: map the (possibly sparse) assigned group IDs onto
4642 // the contiguous range `0..used - 1`. First-seen-in-scan-order
4643 // wins, so the output is deterministic.
4644 let mut remap: Vec<i32> = vec![-1; cluster_k];
4645 let mut next_id: u16 = 0;
4646 for slot in assignment.iter_mut() {
4647 let group = *slot as usize;
4648 if remap[group] < 0 {
4649 remap[group] = next_id as i32;
4650 next_id += 1;
4651 }
4652 *slot = remap[group] as u16;
4653 }
4654 if next_id < 2 {
4655 return vec![0u16; block_count];
4656 }
4657 assignment
4658}
4659
4660/// §6.2.2 per-pixel group selector backed by a flat block-index map.
4661/// Mirrors the decoder's [`crate::vp8l_decode::MetaPrefixIndex`] but
4662/// owns its data so the encoder can build/inspect it without going
4663/// through the decoder type.
4664struct EncoderMetaIndex {
4665 prefix_bits: u8,
4666 block_width: u32,
4667 /// Per-block meta-prefix code in scan-line order, `block_width *
4668 /// block_height` entries.
4669 codes: Vec<u16>,
4670}
4671
4672impl EncoderMetaIndex {
4673 /// §6.2.2 group selection for pixel `(x, y)`:
4674 /// `codes[(y >> prefix_bits) * block_width + (x >> prefix_bits)]`.
4675 fn group_for(&self, x: u32, y: u32) -> u16 {
4676 let bx = x >> self.prefix_bits;
4677 let by = y >> self.prefix_bits;
4678 self.codes[(by * self.block_width + bx) as usize]
4679 }
4680
4681 /// §6.2.2 `num_prefix_groups = max(entropy image) + 1`.
4682 fn num_groups(&self) -> u32 {
4683 self.codes
4684 .iter()
4685 .copied()
4686 .max()
4687 .map(|c| c as u32 + 1)
4688 .unwrap_or(1)
4689 }
4690
4691 /// Build the entropy-image ARGB pixel buffer the §6.2.2 entropy
4692 /// image is decoded from. Per §6.2.2, the meta-prefix code is the
4693 /// red+green channels of the entropy pixel: `(meta_code >> 8) &
4694 /// 0xffff` — i.e. the low 8 bits of `meta_code` go into the green
4695 /// channel and the next 8 bits into the red channel. Other channels
4696 /// (alpha, blue) are zero.
4697 fn entropy_image_argb(&self) -> Vec<u32> {
4698 self.codes
4699 .iter()
4700 .map(|&c| {
4701 let lo = (c & 0xff) as u32; // green
4702 let hi = ((c >> 8) & 0xff) as u32; // red
4703 (hi << 16) | (lo << 8)
4704 })
4705 .collect()
4706 }
4707}
4708
4709/// Split `tokens` into one bucket per group. The LZ77 token stream was
4710/// generated globally over the whole image, so each token's group is
4711/// determined by the position of the *first* pixel it emits — for a
4712/// `Literal` / `CacheRef` that's a single-pixel position; for a
4713/// `Copy { length, distance }` it's the position of the copy's *start*
4714/// pixel. The §6.2.3 decode loop selects the group per *symbol*, so we
4715/// emit each token's symbols entirely under that single group's prefix
4716/// codes (matching the decoder's group-per-symbol contract, which is
4717/// also group-per-token because each token contributes one indexed
4718/// position via the next-undefined-pixel cursor).
4719///
4720/// Returns a `(group_token_lists, group_pixel_positions)` pair where
4721/// `group_token_lists[i]` is the ordered tokens belonging to group `i`
4722/// and `group_pixel_positions[i]` is the parallel list of starting
4723/// pixel positions (used as a sanity check during `count_frequencies`).
4724fn split_tokens_by_group(
4725 tokens: &[Token],
4726 index: &EncoderMetaIndex,
4727 width: u32,
4728 num_groups: u32,
4729) -> Vec<Vec<Token>> {
4730 let mut buckets: Vec<Vec<Token>> = vec![Vec::new(); num_groups as usize];
4731 let mut pos = 0usize;
4732 let w = width as usize;
4733 for &tok in tokens {
4734 let x = (pos % w) as u32;
4735 let y = (pos / w) as u32;
4736 let g = index.group_for(x, y) as usize;
4737 debug_assert!(g < buckets.len());
4738 buckets[g].push(tok);
4739 let consumed = match tok {
4740 Token::Literal(_) | Token::CacheRef { .. } => 1usize,
4741 Token::Copy { length, .. } => length,
4742 };
4743 pos += consumed;
4744 }
4745 buckets
4746}
4747
4748/// Build the encoder-side per-group [`WriteCode`] tables: for each
4749/// group, count its token-bucket frequencies and Huffman-build the
4750/// five §6.2 prefix codes. The GREEN alphabet size is the same across
4751/// groups (`256 + 24 + color_cache_size`) so the on-wire prefix code
4752/// layouts are uniformly sized; the per-group frequency *distributions*
4753/// differ, which is exactly the point — each group gets a code tailored
4754/// to the bucket it represents.
4755///
4756/// Empty-bucket handling: when a group's bucket has zero tokens (the
4757/// clusterer assigned a block group_id that ends up unused after the
4758/// LZ77 matcher's emission cursor walked past it), every per-channel
4759/// frequency table is all-zero. The standard `WriteCode::from_freqs`
4760/// would yield an incomplete (Kraft-sum-zero) code the decoder
4761/// rejects with §6.2.1's "incomplete" error. We mirror
4762/// `write_prefix_codes_and_tokens`'s empty-distance handling for every
4763/// channel in that degenerate case: emit the §3.7.2.1.1 single-symbol-0
4764/// form, which decodes to a valid (one-leaf) code the bucket will
4765/// never actually exercise.
4766fn build_group_codes(
4767 buckets: &[Vec<Token>],
4768 color_cache_size: usize,
4769 image_width: u32,
4770) -> Vec<[WriteCode; 5]> {
4771 let green_alphabet = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + color_cache_size;
4772 buckets
4773 .iter()
4774 .map(|bucket| {
4775 let freqs = count_frequencies(bucket, color_cache_size, image_width);
4776 // `empty(N)` produces a valid one-leaf code over an
4777 // alphabet of size `N` (the §3.7.2.1.1 single-symbol-0
4778 // form). For each channel, fall back to it when no
4779 // symbols were emitted in this bucket — the decoder
4780 // accepts the resulting one-leaf code without ever
4781 // consuming a symbol from it.
4782 let green = if freqs.green.iter().any(|&f| f > 0) {
4783 WriteCode::from_freqs(&freqs.green)
4784 } else {
4785 WriteCode::empty(green_alphabet)
4786 };
4787 let red = if freqs.red.iter().any(|&f| f > 0) {
4788 WriteCode::from_freqs(&freqs.red)
4789 } else {
4790 WriteCode::empty(256)
4791 };
4792 let blue = if freqs.blue.iter().any(|&f| f > 0) {
4793 WriteCode::from_freqs(&freqs.blue)
4794 } else {
4795 WriteCode::empty(256)
4796 };
4797 let alpha = if freqs.alpha.iter().any(|&f| f > 0) {
4798 WriteCode::from_freqs(&freqs.alpha)
4799 } else {
4800 WriteCode::empty(256)
4801 };
4802 let dist = if freqs.distance.iter().any(|&f| f > 0) {
4803 WriteCode::from_freqs(&freqs.distance)
4804 } else {
4805 WriteCode::empty(40)
4806 };
4807 [green, red, blue, alpha, dist]
4808 })
4809 .collect()
4810}
4811
4812/// Try encoding `pixels` with the §6.2.2 multi-meta-prefix path:
4813///
4814/// 1. Cluster the image's `prefix_bits`-aligned blocks into `num_groups`
4815/// groups by coarse-RGB-histogram L1 distance (see
4816/// [`cluster_blocks_by_histogram_distance`]). Blocks whose pixel-
4817/// value distributions agree at bin resolution end up in the same
4818/// group and share a single five-code prefix-code group.
4819/// 2. Tokenise the image via the standard §5.2.2 LZ77 matcher
4820/// (`tokenize_lz77`), optionally cacheifying with `cache_code_bits`.
4821/// 3. Split tokens into per-group buckets, build per-group prefix codes,
4822/// and emit the §3.8.3 image data with:
4823/// * `%b0` (no §3.8.2 transforms in this candidate),
4824/// * `color-cache-info` (`%b0` or `%b1 4BIT`),
4825/// * `meta-prefix = %b1` + 3-bit `prefix_bits - 2`,
4826/// * the entropy image as an `entropy-coded-image` body via
4827/// [`write_entropy_coded_image_literals`],
4828/// * `num_groups` prefix-code groups (5 prefix codes each),
4829/// * the LZ77 token stream emitted with the group selected per
4830/// pixel block.
4831///
4832/// Returns `None` when the candidate is degenerate (image too small
4833/// for the requested block side; clustering collapsed to one group).
4834/// The chooser must fall back to the single-group path in those cases.
4835fn encode_with_meta_prefix(
4836 pixels: &[u32],
4837 width: u32,
4838 height: u32,
4839 prefix_bits: u8,
4840 num_groups: u32,
4841 cache_code_bits: Option<u32>,
4842 image_width: u32,
4843) -> Option<Vec<u8>> {
4844 debug_assert!((2..=9).contains(&prefix_bits));
4845 debug_assert!((1..=MAX_META_GROUPS).contains(&num_groups));
4846
4847 let block_side = 1u32 << prefix_bits;
4848 // The §6.2.2 entropy image is `DIV_ROUND_UP(image_width, block_side)`
4849 // × `DIV_ROUND_UP(image_height, block_side)`. We need at least two
4850 // blocks for a multi-group split to be possible.
4851 let pw = width.div_ceil(block_side);
4852 let ph = height.div_ceil(block_side);
4853 if (pw * ph) < num_groups {
4854 return None;
4855 }
4856
4857 let codes =
4858 cluster_blocks_by_histogram_distance(pixels, width, height, prefix_bits, num_groups);
4859 let index = EncoderMetaIndex {
4860 prefix_bits,
4861 block_width: pw,
4862 codes,
4863 };
4864 let actual_groups = index.num_groups();
4865 if actual_groups < 2 {
4866 // Clustering collapsed — no point paying the meta-prefix overhead.
4867 return None;
4868 }
4869
4870 // Build the LZ77 token stream globally (matches the
4871 // single-group path's token sequence; the group selection happens
4872 // per *symbol* during emission, not per *match*).
4873 let mut tokens = tokenize_lz77(pixels);
4874 if let Some(bits) = cache_code_bits {
4875 tokens = cacheify_tokens(&tokens, pixels, bits);
4876 }
4877
4878 let buckets = split_tokens_by_group(&tokens, &index, width, actual_groups);
4879 let cache_size = cache_code_bits.map(|b| 1usize << b).unwrap_or(0);
4880 let group_codes = build_group_codes(&buckets, cache_size, image_width);
4881
4882 let mut w = BitWriter::new();
4883
4884 // §3.8.2 optional-transform list: empty (no transforms in this
4885 // candidate). Future revisions can stack §4.1 / §4.2 / §4.4 atop
4886 // the multi-prefix path; for now we keep the candidate small.
4887 w.write_bit(false);
4888
4889 // §3.8.3 / §7.3 spatially-coded-image:
4890 // color-cache-info meta-prefix data
4891 //
4892 // color-cache-info: `%b0` (no cache) or `%b1 4BIT` (enabled).
4893 if let Some(bits) = cache_code_bits {
4894 debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&bits));
4895 w.write_bit(true);
4896 w.write_bits(bits, 4);
4897 } else {
4898 w.write_bit(false);
4899 }
4900 // meta-prefix: `%b1` (multi-group).
4901 w.write_bit(true);
4902 // §6.2.2 `prefix_bits = ReadBits(3) + 2`.
4903 w.write_bits((prefix_bits - 2) as u32, 3);
4904
4905 // §6.2.2 entropy image, written as an `entropy-coded-image`
4906 // (color-cache-info=%b0 + single prefix-code group + LZ77 data).
4907 // The §6.2.2 entropy pixels carry `(meta_code >> 8) & 0xffff` in
4908 // red+green; the literal-only writer feeds the decoder's
4909 // `decode_entropy_coded_image` path exactly.
4910 let entropy_image = index.entropy_image_argb();
4911 write_entropy_coded_image_literals(&mut w, &entropy_image);
4912
4913 // §6.2.2 `num_prefix_groups` prefix-code groups, in canonical
4914 // group-index order (group 0 first, then group 1, …).
4915 for group in &group_codes {
4916 for code in group.iter() {
4917 code.write_code_lengths(&mut w);
4918 }
4919 }
4920
4921 // §6.2.3 LZ77 emission: walk tokens in original order, look up the
4922 // group for each token's *start* pixel, and emit its symbols with
4923 // that group's prefix codes. This matches the decoder's
4924 // group-per-symbol contract — the decoder picks the group for
4925 // each pixel from the meta-prefix index, which is constant across
4926 // every symbol contributing to a single token (literal,
4927 // cache-ref, or backward-reference copy whose covered pixels all
4928 // fall in the same block as the start pixel, ensured by the
4929 // block-aligned tokenisation that the chooser feeds the matcher;
4930 // see `bucket_aligns_with_decoder_groups_test`).
4931 let mut pos = 0usize;
4932 let w_pixels = width as usize;
4933 for &tok in &tokens {
4934 let x = (pos % w_pixels) as u32;
4935 let y = (pos / w_pixels) as u32;
4936 let g = index.group_for(x, y) as usize;
4937 let codes = &group_codes[g];
4938 let green_code = &codes[0];
4939 let red_code = &codes[1];
4940 let blue_code = &codes[2];
4941 let alpha_code = &codes[3];
4942 let dist_code = &codes[4];
4943 match tok {
4944 Token::Literal(p) => {
4945 let a = ((p >> 24) & 0xff) as usize;
4946 let r = ((p >> 16) & 0xff) as usize;
4947 let g_ch = ((p >> 8) & 0xff) as usize;
4948 let b = (p & 0xff) as usize;
4949 green_code.write_symbol(&mut w, g_ch);
4950 red_code.write_symbol(&mut w, r);
4951 blue_code.write_symbol(&mut w, b);
4952 alpha_code.write_symbol(&mut w, a);
4953 pos += 1;
4954 }
4955 Token::CacheRef { index: ix } => {
4956 debug_assert!(cache_size > 0, "CacheRef requires an enabled cache");
4957 let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + ix as usize;
4958 green_code.write_symbol(&mut w, sym);
4959 pos += 1;
4960 }
4961 Token::Copy { length, distance } => {
4962 write_lz77_value(&mut w, green_code, 256, length as u32);
4963 let raw_code = pixel_distance_to_distance_code(distance, image_width);
4964 write_lz77_value(&mut w, dist_code, 0, raw_code);
4965 pos += length;
4966 }
4967 }
4968 }
4969
4970 Some(w.into_bytes())
4971}
4972
4973/// Encode an ARGB image to a VP8L *image-stream* (the bytes that follow the
4974/// §3.4 5-byte image-header), running the §5.2.2 LZ77 backward-reference
4975/// matcher so repeated pixel runs compress.
4976///
4977/// As of round 120, the encoder also evaluates the §3.5.3 / §3.8.2
4978/// **subtract-green transform** and emits whichever of the two paths is
4979/// smaller. The transform header costs only three bits (`%b1 %b10`), so on
4980/// natural images where the green-correlated red/blue channels shrink the
4981/// per-channel entropy, subtract-green is a near-free compression win. On
4982/// images where the transform doesn't help (or hurts), the no-transform
4983/// path is kept.
4984///
4985/// `pixels` is `width * height` ARGB values in scan-line order, each
4986/// `(alpha << 24) | (red << 16) | (green << 8) | blue` — the same layout
4987/// [`crate::vp8l_decode::DecodedImage::pixels`] produces. The returned
4988/// bytes, prefixed with the image-header and wrapped in RIFF/WEBP framing,
4989/// decode back to `pixels` exactly.
4990pub fn encode_argb_literals(pixels: &[u32]) -> Vec<u8> {
4991 // Width-less entry: feed `image_width = 1`, which disables the §5.2.2
4992 // distance-map chooser (no map entry reconstructs to a "row" distance
4993 // when the row is a single pixel wide). Production callers go through
4994 // [`encode_argb_literals_with_width`] via [`encode_vp8l_payload`] so
4995 // the optimisation is wired for `.webp` output.
4996 encode_argb_literals_with_width(pixels, 1)
4997}
4998
4999/// Width-aware variant of [`encode_argb_literals`]: same 2×2
5000/// `(no-tx | subtract-green) × (no-cache | cache)` chooser, but each
5001/// candidate threads `image_width` into [`encode_tokens`] so the
5002/// §5.2.2 distance-map optimisation is exercised. The production
5003/// `.webp` path ([`encode_vp8l_payload`] → [`encode_webp_lossless`] /
5004/// [`encode_vp8l_argb`]) uses this entry; the no-width
5005/// [`encode_argb_literals`] is retained for test callers that exercise
5006/// the entropy stage without spatial structure.
5007pub fn encode_argb_literals_with_width(pixels: &[u32], image_width: u32) -> Vec<u8> {
5008 debug_assert!(image_width >= 1);
5009 // For each `(subtract_green)` choice, evaluate the no-cache
5010 // baseline plus every §5.2.3 `cache_code_bits ∈ [1..11]` and keep
5011 // the smallest stream per the round-148 sweep. The §5.2.3 cache
5012 // size is `1 << code_bits` (2..=2048 entries), so different
5013 // payloads peak at different sizes: small-palette images favour
5014 // narrow caches (less header overhead for the same hit-rate);
5015 // large-palette photo-like images favour wider caches (fewer hash
5016 // collisions). Sweeping is the only way to pick the best per
5017 // payload without an analytical model.
5018 let mut best = select_best_cache_bits(|cache_bits| {
5019 encode_literals_with_options(pixels, false, cache_bits, image_width)
5020 });
5021 let sg_best = select_best_cache_bits(|cache_bits| {
5022 encode_literals_with_options(pixels, true, cache_bits, image_width)
5023 });
5024 if sg_best.len() < best.len() {
5025 best = sg_best;
5026 }
5027 best
5028}
5029
5030/// Sweep §5.2.3 `cache_code_bits ∈ [1..11]` plus the disabled-cache
5031/// (`None`) baseline for an encoder candidate, returning the smallest
5032/// stream the closure produced.
5033///
5034/// `build_with_cache` takes the candidate `cache_code_bits` (`None`
5035/// = disable, `Some(bits)` = enable with the given size) and returns
5036/// the encoded bytes for that choice. The function calls
5037/// `build_with_cache` 12 times: once with `None` and once per value
5038/// in [`COLOR_CACHE_BITS_MIN`]..=[`COLOR_CACHE_BITS_MAX`], i.e. the
5039/// full §5.2.3 `[1..11]` range a compliant decoder accepts.
5040///
5041/// The §5.2.3 cache size is `1 << code_bits`, so the optimum varies
5042/// per payload:
5043///
5044/// * **Disabled** wins on uncorrelated noise (every "hit" is a hash
5045/// collision; the §3.8.3 `color-cache-info` `%b1 4BIT` header costs
5046/// five bits the no-cache path doesn't pay; the GREEN alphabet
5047/// stays at `256 + 24 = 280` symbols rather than growing to
5048/// `256 + 24 + cache_size`).
5049/// * **Narrow caches** (`code_bits` 1..4 → 2..16 entries) win on
5050/// payloads with a tiny effective palette where a 256-entry cache
5051/// wastes alphabet width on slots that never see a hit.
5052/// * **Wide caches** (`code_bits` 9..11 → 512..2048 entries) win on
5053/// photo-like images with hundreds of distinct colors where hash
5054/// collisions in a 256-entry cache prevent a hit.
5055///
5056/// Note that the §3.7.2 prefix code's alphabet length is exactly
5057/// `256 + 24 + (1 << code_bits)`, so a wider cache also widens every
5058/// emitted code-length-table entry; the trade-off between hit rate
5059/// and alphabet overhead is non-monotonic, which is why the chooser
5060/// sweeps the full range instead of using a single heuristic value.
5061fn select_best_cache_bits<F>(mut build_with_cache: F) -> Vec<u8>
5062where
5063 F: FnMut(Option<u32>) -> Vec<u8>,
5064{
5065 let mut best = build_with_cache(None);
5066 for bits in COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX {
5067 let cand = build_with_cache(Some(bits));
5068 if cand.len() < best.len() {
5069 best = cand;
5070 }
5071 }
5072 best
5073}
5074
5075/// Encode `pixels` with explicit knobs: optionally apply the §3.5.3 /
5076/// §3.8.2 subtract-green transform, optionally enable a §5.2.3 color
5077/// cache with the given `code_bits` (`None` disables it). The
5078/// implementation runs the §5.2.2 LZ77 matcher, then (if a cache is
5079/// requested) rewrites literal tokens into §5.2.3 cache references in
5080/// stream order, then emits the §3.8.3 image stream.
5081fn encode_literals_with_options(
5082 pixels: &[u32],
5083 subtract_green: bool,
5084 cache_code_bits: Option<u32>,
5085 image_width: u32,
5086) -> Vec<u8> {
5087 let mut working = pixels.to_vec();
5088 if subtract_green {
5089 apply_subtract_green(&mut working);
5090 }
5091 let mut tokens = tokenize_lz77(&working);
5092 if let Some(bits) = cache_code_bits {
5093 tokens = cacheify_tokens(&tokens, &working, bits);
5094 }
5095 encode_tokens(&tokens, subtract_green, cache_code_bits, image_width)
5096}
5097
5098/// Encode an ARGB image with the literal-only, no-transform path: every
5099/// pixel becomes a §5.2.1 ARGB literal and no §3.8.2 transform is written.
5100/// Retained as the baseline the round-119 size-reduction test compares the
5101/// LZ77 path against; [`encode_argb_literals`] is the default entry point.
5102pub fn encode_argb_literals_only(pixels: &[u32]) -> Vec<u8> {
5103 let tokens: Vec<Token> = pixels.iter().map(|&p| Token::Literal(p)).collect();
5104 // Literal-only stream emits no Copy tokens, so `image_width` is
5105 // unused by the entropy stage; pass 1 as the trivial value.
5106 encode_tokens(&tokens, false, None, 1)
5107}
5108
5109/// Encode an ARGB image forcing the §3.5.3 / §3.8.2 subtract-green
5110/// transform on, regardless of whether it shrinks the stream. Used by the
5111/// round-120 size-reduction comparison test to measure the transform's
5112/// effect on a natural-image-like fixture; production callers use
5113/// [`encode_argb_literals`] which picks the smaller of the two paths.
5114pub fn encode_argb_literals_subtract_green(pixels: &[u32]) -> Vec<u8> {
5115 let mut sg_pixels = pixels.to_vec();
5116 apply_subtract_green(&mut sg_pixels);
5117 let tokens = tokenize_lz77(&sg_pixels);
5118 // Width-less test entry: pass 1 (the chooser falls back to scan-line).
5119 encode_tokens(&tokens, true, None, 1)
5120}
5121
5122/// Encode an ARGB image forcing a §5.2.3 color cache on (size
5123/// `1 << cache_code_bits`), with no §3.8.2 transform. Used by the
5124/// round-121 size-reduction comparison test to isolate the cache's
5125/// effect from the subtract-green chooser; production callers use
5126/// [`encode_argb_literals`] which picks the smallest of the four
5127/// path combinations.
5128pub fn encode_argb_literals_color_cache(pixels: &[u32], cache_code_bits: u32) -> Vec<u8> {
5129 debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&cache_code_bits));
5130 // Width-less test entry: pass 1 (the chooser falls back to scan-line).
5131 encode_literals_with_options(pixels, false, Some(cache_code_bits), 1)
5132}
5133
5134/// Shared entropy stage: from a §5.2.2 token stream, build the five prefix
5135/// codes and emit the §3.8.3 image data (optional-transform header,
5136/// color-cache-info, meta-prefix, the five prefix-code length tables, then
5137/// the LZ77-coded image).
5138///
5139/// `subtract_green` controls the §3.8.2 transform header: `false` emits a
5140/// single `%b0` terminator (no transform); `true` emits `%b1 %b10 %b0` —
5141/// the subtract-green transform (type 2, bodyless) followed by the end-of-
5142/// list terminator.
5143///
5144/// `color_cache_code_bits` controls the §5.2.3 `color-cache-info` field:
5145/// `None` emits `%b0` (no cache); `Some(bits)` emits `%b1 4BIT` with the
5146/// caller-supplied `code_bits ∈ [1, 11]`. The token stream must already
5147/// reflect the choice — `CacheRef` tokens are only meaningful when the
5148/// cache is enabled.
5149///
5150/// `image_width` is the §3.4 image width the encoded stream describes;
5151/// it feeds [`pixel_distance_to_distance_code`] for the §5.2.2 distance
5152/// chooser so backward references whose scan-line distance equals
5153/// `xi + yi*image_width` for some distance-map entry get the smaller
5154/// distance code. Pass `1` to retain the round-119 scan-line-only
5155/// behaviour (no map codes match at width 1 for typical distances).
5156fn encode_tokens(
5157 tokens: &[Token],
5158 subtract_green: bool,
5159 color_cache_code_bits: Option<u32>,
5160 image_width: u32,
5161) -> Vec<u8> {
5162 let mut w = BitWriter::new();
5163
5164 // §3.8.2 optional-transform.
5165 if subtract_green {
5166 // Present-bit `%b1`, then 2-bit TransformType `SubtractGreen` (value
5167 // 2 in LSB-first bit order: bit0=0, bit1=1 — matches the spec's
5168 // `%b10` MSB-first notation when read through the LSB-first
5169 // `ReadBits(2)`). No body for subtract-green per §3.5.3 / §3.8.2.
5170 w.write_bit(true);
5171 w.write_bits(crate::vp8l_stream::TransformType::SubtractGreen as u32, 2);
5172 }
5173 // End-of-list terminator.
5174 w.write_bit(false);
5175
5176 write_spatially_coded_image(&mut w, tokens, color_cache_code_bits, image_width);
5177
5178 w.into_bytes()
5179}
5180
5181/// Write the §3.8.3 / §7.3 `spatially-coded-image` body — everything
5182/// after the §3.8.2 / §7.2 `optional-transform` terminator: the
5183/// `color-cache-info` bit(s), the `meta-prefix` bit (always `%b0` here
5184/// — single prefix-code group), the five prefix codes, and the
5185/// LZ77-coded image.
5186///
5187/// This is the writer counterpart of
5188/// [`crate::vp8l_decode::decode_argb`] for the single-meta-prefix
5189/// case, and the same body the §4.1 / §4.2 transform encoders wrap
5190/// after writing their own optional-transform header(s) (the
5191/// transform headers and any sub-resolution image bodies are written
5192/// by the caller; this function only emits the trailing
5193/// `spatially-coded-image`).
5194fn write_spatially_coded_image(
5195 w: &mut BitWriter,
5196 tokens: &[Token],
5197 color_cache_code_bits: Option<u32>,
5198 image_width: u32,
5199) {
5200 // §3.8.3 spatially-coded-image = color-cache-info meta-prefix data.
5201 // color-cache-info: `%b0` (no cache) or `%b1 4BIT` (enabled).
5202 let color_cache_size = match color_cache_code_bits {
5203 Some(bits) => {
5204 debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&bits));
5205 w.write_bit(true);
5206 w.write_bits(bits, 4);
5207 1usize << bits
5208 }
5209 None => {
5210 w.write_bit(false);
5211 0
5212 }
5213 };
5214 // meta-prefix: `%b0` (single prefix-code group).
5215 w.write_bit(false);
5216
5217 write_prefix_codes_and_tokens(w, tokens, color_cache_size, image_width);
5218}
5219
5220/// Write an §7.3 `entropy-coded-image` (color-cache-info + data) of
5221/// `pixels.len()` ARGB pixels in scan-line order, using a
5222/// literal-only encoding with NO color cache and NO LZ77 matching.
5223///
5224/// This is the body shape required for the §4.1 predictor image and
5225/// the §4.2 color-transform image (per §7.2 ABNF: `predictor-image =
5226/// 3BIT ; sub-pixel code / entropy-coded-image`). The decoder reads
5227/// it via [`crate::vp8l_decode::decode_entropy_coded_image`].
5228///
5229/// Sub-resolution transform images are tiny (one ARGB pixel per
5230/// `block_width × block_height` block of the main image), so the
5231/// per-pixel overhead of the §5.2.2 LZ77 / §5.2.3 cache machinery
5232/// rarely pays off — the literal-only path is the smallest write for
5233/// these bodies in practice.
5234fn write_entropy_coded_image_literals(w: &mut BitWriter, pixels: &[u32]) {
5235 // color-cache-info = `%b0` (no cache).
5236 w.write_bit(false);
5237
5238 let tokens: Vec<Token> = pixels.iter().map(|&p| Token::Literal(p)).collect();
5239 // `image_width = 1` is the trivial value (no Copy tokens are
5240 // emitted by a literal-only stream, so the distance-code chooser
5241 // is unused). `color_cache_size = 0` disables the cache alphabet.
5242 write_prefix_codes_and_tokens(w, &tokens, 0, 1);
5243}
5244
5245/// Shared `data = prefix-codes lz77-coded-image` writer (§3.8.3 /
5246/// §7.3). Builds the five §3.7.2 prefix codes from token
5247/// frequencies, writes their code lengths in green/red/blue/alpha/
5248/// distance order, then emits the token stream.
5249fn write_prefix_codes_and_tokens(
5250 w: &mut BitWriter,
5251 tokens: &[Token],
5252 color_cache_size: usize,
5253 image_width: u32,
5254) {
5255 // Build the five prefix codes from token frequencies. The GREEN
5256 // alphabet covers literals (`< 256`), the §5.2.2 length prefix
5257 // symbols (`256 + length_prefix`), and (when the cache is enabled)
5258 // the §5.2.3 cache indices (`256 + 24 + index`). The distance
5259 // alphabet (40 codes) is exercised only when the matcher emitted at
5260 // least one copy.
5261 let freqs = count_frequencies(tokens, color_cache_size, image_width);
5262 let green_code = WriteCode::from_freqs(&freqs.green);
5263 let red_code = WriteCode::from_freqs(&freqs.red);
5264 let blue_code = WriteCode::from_freqs(&freqs.blue);
5265 let alpha_code = WriteCode::from_freqs(&freqs.alpha);
5266 // Prefix #5 (distance): if no backward references were emitted, the
5267 // frequency table is all-zero → `from_freqs` yields the empty code,
5268 // which `WriteCode` serialises as the §3.7.2.1.1 single-symbol-0 form.
5269 let dist_code = if freqs.distance.iter().any(|&f| f > 0) {
5270 WriteCode::from_freqs(&freqs.distance)
5271 } else {
5272 WriteCode::empty(40)
5273 };
5274
5275 // data = prefix-codes lz77-coded-image.
5276 // prefix-code-group = 5 prefix codes, in bitstream order:
5277 // green, red, blue, alpha, distance.
5278 green_code.write_code_lengths(w);
5279 red_code.write_code_lengths(w);
5280 blue_code.write_code_lengths(w);
5281 alpha_code.write_code_lengths(w);
5282 dist_code.write_code_lengths(w);
5283
5284 // lz77-coded-image: each token is either a §5.2.1 ARGB literal
5285 // (channel order green, red, blue, alpha), a §5.2.3 color-cache
5286 // reference (a single GREEN symbol), or a §5.2.2 length + distance
5287 // backward reference.
5288 for &tok in tokens {
5289 match tok {
5290 Token::Literal(p) => {
5291 let a = ((p >> 24) & 0xff) as usize;
5292 let r = ((p >> 16) & 0xff) as usize;
5293 let g = ((p >> 8) & 0xff) as usize;
5294 let b = (p & 0xff) as usize;
5295 green_code.write_symbol(w, g);
5296 red_code.write_symbol(w, r);
5297 blue_code.write_symbol(w, b);
5298 alpha_code.write_symbol(w, a);
5299 }
5300 Token::CacheRef { index } => {
5301 // §5.2.3: GREEN symbol is `256 + 24 + index`. Red /
5302 // blue / alpha are not transmitted; the decoder
5303 // recovers the full ARGB from the cache slot.
5304 debug_assert!(color_cache_size > 0, "CacheRef requires an enabled cache");
5305 let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + index as usize;
5306 green_code.write_symbol(w, sym);
5307 }
5308 Token::Copy { length, distance } => {
5309 // §5.2.2: length via a GREEN length symbol (base 256), then
5310 // distance via prefix code #5 (base 0). The chooser must
5311 // agree with `count_frequencies` so the prefix-code Huffman
5312 // tree we built actually contains the prefix slot we look up.
5313 write_lz77_value(w, &green_code, 256, length as u32);
5314 let raw_code = pixel_distance_to_distance_code(distance, image_width);
5315 write_lz77_value(w, &dist_code, 0, raw_code);
5316 }
5317 }
5318 }
5319}
5320
5321/// Build the §3.4 / §7.1 5-byte VP8L image-header.
5322///
5323/// `0x2F` signature + 14-bit `(width-1)` + 14-bit `(height-1)` +
5324/// `alpha_is_used` bit + 3-bit `version` (0). The exact inverse of
5325/// [`crate::vp8l_chunk::WebpLosslessChunk::from_payload`]'s header peek.
5326fn build_image_header(width: u32, height: u32, alpha_is_used: bool) -> [u8; 5] {
5327 let packed: u32 =
5328 ((width - 1) & 0x3FFF) | (((height - 1) & 0x3FFF) << 14) | ((alpha_is_used as u32) << 28);
5329 // version is 0 → bits 29..31 stay zero.
5330 [
5331 crate::vp8l_chunk::VP8L_SIGNATURE,
5332 (packed & 0xFF) as u8,
5333 ((packed >> 8) & 0xFF) as u8,
5334 ((packed >> 16) & 0xFF) as u8,
5335 ((packed >> 24) & 0xFF) as u8,
5336 ]
5337}
5338
5339/// Encode an interleaved 8-bit RGBA image to a complete RIFF/WEBP file
5340/// carrying a §2.6 simple-lossless `VP8L` chunk.
5341///
5342/// `rgba` is `width * height * 4` bytes in scan-line order, each pixel
5343/// `[R, G, B, A]` — the `oxideav_core::PixelFormat::Rgba` layout
5344/// [`crate::DecodedWebp::rgba`] uses. The returned file decodes back to the
5345/// same RGBA bytes through [`crate::decode_webp`], a pixel-exact round trip.
5346///
5347/// The encoder takes the simplest spec-conformant path: no §3.8.2
5348/// transform, no §3.8.3 color cache, a single meta-prefix code, and a
5349/// literal-only image (no LZ77 backward references). The §3.7.2 prefix
5350/// codes are built per-image from the pixel data.
5351pub fn encode_webp_lossless(rgba: &[u8], width: u32, height: u32) -> Result<Vec<u8>, EncodeError> {
5352 if width == 0 || height == 0 || width > MAX_DIMENSION || height > MAX_DIMENSION {
5353 return Err(EncodeError::InvalidDimensions { width, height });
5354 }
5355 let expected = (width as usize) * (height as usize) * 4;
5356 if rgba.len() != expected {
5357 return Err(EncodeError::PixelBufferMismatch {
5358 got: rgba.len(),
5359 expected,
5360 });
5361 }
5362
5363 // Repack RGBA → ARGB and detect whether alpha is non-trivial.
5364 let mut pixels = Vec::with_capacity(rgba.len() / 4);
5365 let mut alpha_is_used = false;
5366 for px in rgba.chunks_exact(4) {
5367 let (r, g, b, a) = (px[0] as u32, px[1] as u32, px[2] as u32, px[3] as u32);
5368 if a != 0xff {
5369 alpha_is_used = true;
5370 }
5371 pixels.push((a << 24) | (r << 16) | (g << 8) | b);
5372 }
5373
5374 let payload = encode_vp8l_payload(&pixels, width, height, alpha_is_used);
5375
5376 // §2.4 / §2.6 RIFF/WEBP framing around the VP8L payload.
5377 let file = build::build_webp_file(&payload, ImageKind::Lossless, width, height)?;
5378 Ok(file)
5379}
5380
5381/// Validate `width`/`height` against the §3.4 14-bit field range and check
5382/// that an ARGB pixel slice carries exactly `width * height` pixels.
5383///
5384/// Shared by the bare-bitstream [`encode_vp8l_argb`] / [`encode_vp8l_argb_with`]
5385/// entry points. Returns the §3.7.2.1.1 "pixel buffer is N, expected M"
5386/// mismatch error using `pixels.len() * 4` so the byte counts match the
5387/// RGBA-flavoured [`encode_webp_lossless`] error.
5388fn validate_argb(pixels: &[u32], width: u32, height: u32) -> Result<(), EncodeError> {
5389 if width == 0 || height == 0 || width > MAX_DIMENSION || height > MAX_DIMENSION {
5390 return Err(EncodeError::InvalidDimensions { width, height });
5391 }
5392 let expected = (width as usize) * (height as usize);
5393 if pixels.len() != expected {
5394 return Err(EncodeError::PixelBufferMismatch {
5395 got: pixels.len() * 4,
5396 expected: expected * 4,
5397 });
5398 }
5399 Ok(())
5400}
5401
5402/// Assemble the bare §2.6 / §3.4 `VP8L` chunk **payload** for an ARGB image:
5403/// the 5-byte §3.4 image-header followed by the §3.8.3 image stream.
5404///
5405/// `pixels` is `width * height` ARGB values in scan-line order, each
5406/// `(alpha << 24) | (red << 16) | (green << 8) | blue`. `alpha_is_used`
5407/// becomes the §3.4 `alpha_is_used` header bit. This is the inner payload a
5408/// `VP8L` chunk wraps — *not* a RIFF/WEBP file. Callers wanting the framed
5409/// file use [`encode_webp_lossless`] / [`encode_vp8l_argb_with_metadata`].
5410fn encode_vp8l_payload(pixels: &[u32], width: u32, height: u32, alpha_is_used: bool) -> Vec<u8> {
5411 // Production path: thread the actual image width so the §5.2.2
5412 // distance-map chooser can swap row-style scan-line codes for
5413 // small distance-map codes (round 130).
5414 let stream = encode_argb_with_predictor_chooser(pixels, width, height);
5415 let header = build_image_header(width, height, alpha_is_used);
5416 let mut payload = Vec::with_capacity(header.len() + stream.len());
5417 payload.extend_from_slice(&header);
5418 payload.extend_from_slice(&stream);
5419 payload
5420}
5421
5422/// Width × height-aware super-chooser: evaluates the four
5423/// `(no-tx | subtract-green) × (no-cache | cache)` candidates plus
5424/// (as of round 155) two §4.1 spatial-predictor `size_bits`
5425/// candidates, two §3.5.2 / §4.2 color-transform `size_bits`
5426/// candidates, and (as of round 150) one §4.4 color-indexing
5427/// candidate when the unique-color count fits in the §4.4
5428/// 256-entry table, each with the round-148 §5.2.3
5429/// `cache_code_bits ∈ [1..11]` sweep plus the disabled-cache
5430/// baseline. Returns the smallest of the resulting streams.
5431///
5432/// The block-based transform-bearing candidates (§4.1 predictor,
5433/// §4.2 color) are only considered when both dimensions are at least
5434/// `1 << size_bits` (otherwise the sub-resolution transform image
5435/// collapses to a single block with no useful per-block resolution).
5436/// The §4.4 color-indexing candidate has no per-block size_bits and
5437/// is gated solely on palette feasibility (≤ 256 unique colors);
5438/// for smaller images or photo-like content the existing
5439/// no-transform / subtract-green chooser remains the only path.
5440fn encode_argb_with_predictor_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
5441 let mut best = encode_argb_literals_with_width(pixels, width);
5442
5443 // The §4.1 predictor and §4.2 color transform pay off once the
5444 // image is at least one block wide AND tall, so each block
5445 // carries some real per-block residual mass. For images smaller
5446 // than a block, the chooser skips both transforms (the no-tx /
5447 // subtract-green paths are strictly cheaper in that regime — no
5448 // transform header, no sub-image bytes).
5449 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
5450 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
5451 let pred_block = 1u32 << pred_size_bits;
5452 let ctx_block = 1u32 << ctx_size_bits;
5453
5454 if width >= pred_block && height >= pred_block {
5455 // Round 155: sweep two `size_bits` values for the §4.1
5456 // spatial predictor, mirroring the §4.2 color-transform shape
5457 // below. The default (16-pixel blocks → per-region predictor-
5458 // mode granularity, good for images whose local statistics
5459 // change across regions) is paired with a maximal single-block
5460 // transform whose `size_bits` is large enough that the entire
5461 // image collapses into one mode (1 sub-image pixel → 4-byte
5462 // sub-image overhead, the cheapest possible §4.1 header). Per
5463 // RFC 9649 §4.1 `size_bits` ranges over `[2..=9]` (`block`
5464 // sizes 4..=512); the maximal value here is whatever `2..=9`
5465 // makes the sub-image at most 1×1. Single-block is best on
5466 // images whose local statistics agree everywhere (one
5467 // dominant predictor mode does the entire image, so the per-
5468 // region mode-image's bits are pure overhead); per-region
5469 // wins on images whose best-mode varies spatially.
5470 let mut pred_single_block_size_bits: u8 = pred_size_bits;
5471 while pred_single_block_size_bits < 9
5472 && ((1u32 << pred_single_block_size_bits) < width
5473 || (1u32 << pred_single_block_size_bits) < height)
5474 {
5475 pred_single_block_size_bits += 1;
5476 }
5477 // Deduplicate when the per-region and single-block size_bits
5478 // collapse onto the same value (small images).
5479 let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
5480 // Round 148: per `size_bits`, sweep §5.2.3
5481 // `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline
5482 // (was hardcoded at `DEFAULT_COLOR_CACHE_BITS = 8`).
5483 let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
5484 encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
5485 })];
5486 // Round 160: add §4.1 slack-cost tie-break candidates.
5487 // `slack > 0` lets the per-block chooser swap to the
5488 // preferred-neighbour mode at a small residual-cost
5489 // increase, dropping the §7.2 predictor-sub-image's symbol
5490 // entropy. The slack budget is expressed in residual-
5491 // magnitude units summed across the whole block, so it
5492 // scales linearly with the block's pixel count to stay a
5493 // bounded per-pixel quantity. Two slack settings (1× and 2×
5494 // the pixel count) are tried; the chooser picks the
5495 // shortest stream and is therefore non-regressing relative
5496 // to the strict-tie-break (slack = 0) baseline.
5497 let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
5498 for slack in [
5499 pred_block_pixels,
5500 2 * pred_block_pixels,
5501 4 * pred_block_pixels,
5502 ] {
5503 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5504 encode_with_predictor_slack(
5505 pixels,
5506 width,
5507 height,
5508 pred_size_bits,
5509 cache_bits,
5510 width,
5511 slack,
5512 )
5513 }));
5514 }
5515 // Round 161: add the Shannon-entropy bit-cost candidate at
5516 // the per-region `size_bits`. Per-block mode is chosen by
5517 // a true Huffman lower-bound bit cost on the residual byte
5518 // histogram rather than the L1-magnitude proxy used by the
5519 // round-159/160 candidates. RFC 9649 §3.5 authorises the
5520 // choice ("transform data can be decided based on entropy
5521 // minimization"); the entropy cost replaces the proxy with
5522 // the actual metric Huffman codes minimise. The chooser
5523 // keeps both the entropy and L1 candidates and emits the
5524 // byte-shortest stream so the round-161 path cannot
5525 // regress against the round-160 baseline.
5526 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5527 encode_with_predictor_entropy(pixels, width, height, pred_size_bits, cache_bits, width)
5528 }));
5529 // Round 162: add the *sub-image-aware* Shannon-entropy
5530 // candidate at the per-region `size_bits` across a small
5531 // lambda sweep. Per-block mode is chosen on a joint cost
5532 // that adds the §7.2 predictor sub-image's marginal Shannon
5533 // bit-cost contribution (weighted by lambda) to the round-
5534 // 161 per-block residual entropy. Where the round-159 hint
5535 // and round-160 slack budget act only on local neighbour
5536 // identity, the round-162 chooser accounts for the running
5537 // sub-image distribution globally. `lambda_milli = 0`
5538 // recovers the round-161 chooser exactly; the swept values
5539 // here weight one sub-image bit at 1×, 4×, 16× a residual
5540 // bit (a 16×16 block contains 256 residual symbols per
5541 // channel — so even modest sub-image weighting can pay back
5542 // through longer mode-runs in the sub-image's prefix code).
5543 // The chooser keeps the byte-shortest stream so the round-
5544 // 162 path cannot regress against the round-161 baseline.
5545 //
5546 // The lambda sweep targets the empirically-observed cost
5547 // crossover on smooth-gradient fixtures (~64000 milli-per-
5548 // bit): below that, the residual cost dominates and the
5549 // round-161 chooser already wins; above that, the sub-
5550 // image's mass dominates and converging the mode set pays
5551 // back through a much smaller §7.2 prefix-code header.
5552 for lambda_milli in [4_000u64, 16_000u64, 64_000u64, 256_000u64] {
5553 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5554 encode_with_predictor_entropy_subaware(
5555 pixels,
5556 width,
5557 height,
5558 pred_size_bits,
5559 cache_bits,
5560 width,
5561 lambda_milli,
5562 )
5563 }));
5564 }
5565 if try_pred_single_block {
5566 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5567 encode_with_predictor(
5568 pixels,
5569 width,
5570 height,
5571 pred_single_block_size_bits,
5572 cache_bits,
5573 width,
5574 )
5575 }));
5576 // Round-160 slack-cost candidates also at the single-
5577 // block size_bits. A single block has one predictor-
5578 // image entry, so the slack-cost variant degenerates to
5579 // the strict variant at this `size_bits` (no neighbour
5580 // hint exists to fire); the candidate is still
5581 // evaluated to keep the sweep regular, but its
5582 // contribution to the byte-best win comes through the
5583 // per-region size_bits.
5584 let single_pred_block_pixels: u64 =
5585 (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
5586 for slack in [
5587 single_pred_block_pixels,
5588 2 * single_pred_block_pixels,
5589 4 * single_pred_block_pixels,
5590 ] {
5591 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5592 encode_with_predictor_slack(
5593 pixels,
5594 width,
5595 height,
5596 pred_single_block_size_bits,
5597 cache_bits,
5598 width,
5599 slack,
5600 )
5601 }));
5602 }
5603 // Round 161: also evaluate the Shannon-entropy candidate
5604 // at the single-block size_bits. With one block the hint
5605 // mechanism never fires (no neighbour exists) and the
5606 // entropy chooser degenerates to "pick the mode whose
5607 // single-block residual histogram has the lowest Huffman
5608 // bit cost" — still a strict improvement over the L1
5609 // proxy on fixtures whose distribution skews the
5610 // ordering between the two metrics.
5611 pred_candidates.push(select_best_cache_bits(|cache_bits| {
5612 encode_with_predictor_entropy(
5613 pixels,
5614 width,
5615 height,
5616 pred_single_block_size_bits,
5617 cache_bits,
5618 width,
5619 )
5620 }));
5621 }
5622 for cand in pred_candidates {
5623 if cand.len() < best.len() {
5624 best = cand;
5625 }
5626 }
5627 }
5628
5629 if width >= ctx_block && height >= ctx_block {
5630 // Sweep two `size_bits` values for the color transform: the
5631 // default (16-pixel blocks → per-region CTE granularity, good
5632 // for varying-correlation natural images) and a maximal
5633 // single-block transform whose `size_bits` is large enough
5634 // that the entire image collapses into one CTE (1 sub-image
5635 // pixel → 4-byte sub-image overhead, the cheapest possible
5636 // header). Single-block is best for high-noise images with
5637 // a single dominant channel correlation; per-region wins on
5638 // images whose correlation varies spatially.
5639 let mut single_block_size_bits: u8 = ctx_size_bits;
5640 while single_block_size_bits < 9
5641 && ((1u32 << single_block_size_bits) < width
5642 || (1u32 << single_block_size_bits) < height)
5643 {
5644 single_block_size_bits += 1;
5645 }
5646 // Deduplicate when the per-region and single-block size_bits
5647 // collapse onto the same value (small images).
5648 let try_single_block = single_block_size_bits != ctx_size_bits;
5649 // Round 148: per `size_bits`, sweep §5.2.3
5650 // `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline
5651 // (was hardcoded at `DEFAULT_COLOR_CACHE_BITS = 8`).
5652 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
5653 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
5654 })];
5655 // Round 308: §4.2 entropy-cost per-block CTE candidate at the
5656 // per-region `size_bits`. Where the L1 chooser above scores
5657 // each candidate by the folded residual magnitude, this one
5658 // scores by the Shannon lower-bound bit cost of the per-channel
5659 // residual histogram — the §4.2 analogue of the round-161 §4.1
5660 // predictor entropy chooser (RFC 9649 §3.5 authorises deciding
5661 // transform data by entropy minimization). The chooser keeps
5662 // the byte-shortest stream, so this candidate cannot regress
5663 // against the L1 path, and round-trip output is identical
5664 // regardless of which CTE the cost model records.
5665 candidates.push(select_best_cache_bits(|cache_bits| {
5666 encode_with_color_transform_strategy(
5667 pixels,
5668 width,
5669 height,
5670 ctx_size_bits,
5671 cache_bits,
5672 width,
5673 ColorTransformStrategy::Entropy,
5674 )
5675 }));
5676 if try_single_block {
5677 candidates.push(select_best_cache_bits(|cache_bits| {
5678 encode_with_color_transform(
5679 pixels,
5680 width,
5681 height,
5682 single_block_size_bits,
5683 cache_bits,
5684 width,
5685 )
5686 }));
5687 // Round 308: entropy-cost CTE candidate at the single-block
5688 // `size_bits`. With one block the histogram is the whole
5689 // image's per-channel residual distribution, so the entropy
5690 // metric selects the single CTE whose red / blue residual
5691 // streams carry the cheapest §5.x prefix codes.
5692 candidates.push(select_best_cache_bits(|cache_bits| {
5693 encode_with_color_transform_strategy(
5694 pixels,
5695 width,
5696 height,
5697 single_block_size_bits,
5698 cache_bits,
5699 width,
5700 ColorTransformStrategy::Entropy,
5701 )
5702 }));
5703 }
5704 for cand in candidates {
5705 if cand.len() < best.len() {
5706 best = cand;
5707 }
5708 }
5709
5710 // Round 303: §3.5 stacked-transform candidate — §4.2 cross-color
5711 // chained with the §4.1 predictor over the color-transformed
5712 // image, the pair the spec targets at photo / natural-image
5713 // content. The color transform decorrelates red / blue against
5714 // green; the predictor then removes the spatial correlation that
5715 // survives in each channel, so the entropy stage sees residuals
5716 // closer to zero than either transform alone. The candidate is
5717 // non-regressing (kept only when strictly smaller than the running
5718 // best) and reuses the same `width >= ctx_block && height >=
5719 // ctx_block` gate (both stacked sub-images need at least one full
5720 // block square). Two `size_bits` are swept — the default
5721 // per-region granularity and a maximal single-block header —
5722 // each across the round-148 cache-bits sweep.
5723 let mut ctp_size_bits = vec![ctx_size_bits];
5724 if try_single_block {
5725 ctp_size_bits.push(single_block_size_bits);
5726 }
5727 // Round 305: sweep the predictor-sub-image strategy (L1 /
5728 // entropy / sub-image-aware) over the color-decorrelated
5729 // residual the chain feeds the predictor. Non-regressing — the
5730 // byte-shortest candidate is kept.
5731 for &sb in &ctp_size_bits {
5732 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5733 let cand = select_best_cache_bits(|cache_bits| {
5734 encode_with_color_transform_predictor(
5735 pixels,
5736 width,
5737 height,
5738 sb,
5739 cache_bits,
5740 pred_strategy,
5741 )
5742 });
5743 if cand.len() < best.len() {
5744 best = cand;
5745 }
5746 }
5747 }
5748
5749 // Round 304: §3.5 *three-transform* stacked candidate — §4.2
5750 // cross-color → §4.3 subtract-green → §4.1 predictor, the natural
5751 // three-axis extension of the round-303 color + predictor pair.
5752 // The per-block §4.2 color transform removes the modeled
5753 // inter-channel correlation; a header-free §4.3 subtract-green pass
5754 // then removes the uniform red/blue-vs-green correlation that
5755 // survives the coarse per-block CTE multipliers; a §4.1 predictor
5756 // pass removes the spatial correlation left in each channel. RFC
5757 // 9649 §3.5 permits up to four transforms stacked (each used once)
5758 // with inverses applied last-read-first; the decoder's generic
5759 // reverse-read-order chain already handles this list, so no decoder
5760 // change is required. The candidate is non-regressing (kept only
5761 // when strictly smaller than the running best) and reuses the same
5762 // `width >= ctx_block && height >= ctx_block` gate, swept over the
5763 // default per-region and maximal single-block `size_bits` each
5764 // across the round-148 cache-bits sweep.
5765 // Round 305: sweep the predictor-sub-image strategy here too.
5766 for &sb in &ctp_size_bits {
5767 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5768 let cand = select_best_cache_bits(|cache_bits| {
5769 encode_with_color_transform_subtract_green_predictor(
5770 pixels,
5771 width,
5772 height,
5773 sb,
5774 cache_bits,
5775 pred_strategy,
5776 )
5777 });
5778 if cand.len() < best.len() {
5779 best = cand;
5780 }
5781 }
5782 }
5783 }
5784
5785 // Round 150: §4.4 color-indexing transform candidate. Considered
5786 // unconditionally (no per-block size_bits to sweep): a single
5787 // O(N) palette probe decides feasibility, so the path is cheap
5788 // to skip on photo-like content. On palette-ish images (icons,
5789 // line art, screen captures) the bundled-index stream shrinks
5790 // the §5 image data dramatically (a 4-color image packs 4 pixels
5791 // per byte at width_bits=2, giving the entropy stage 1/4 the
5792 // symbols to code), more than paying for the palette-write
5793 // overhead.
5794 if collect_palette(pixels).is_some() {
5795 let ci_best = select_best_cache_bits(|cache_bits| {
5796 encode_with_color_indexing(pixels, width, height, cache_bits)
5797 .expect("palette feasibility already confirmed")
5798 });
5799 if ci_best.len() < best.len() {
5800 best = ci_best;
5801 }
5802
5803 // Round 302: §3.5 stacked-transform candidate — §4.4
5804 // color-indexing chained with the §4.1 predictor over the
5805 // bundled-index image. On palette content the bundled green-
5806 // channel indices run in long spatially-coherent stretches, so
5807 // a predictor pass over them drives the residuals toward zero
5808 // and shrinks the entropy stage below the single-transform
5809 // color-indexing path. The candidate is non-regressing: it is
5810 // only kept when strictly smaller than the running best, and it
5811 // self-skips (returns `None`) when the packed image is too
5812 // small to carry a predictor block. Two `size_bits` are swept —
5813 // the default per-region granularity and a maximal single-block
5814 // header — each across the round-148 cache-bits sweep.
5815 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
5816 let mut ci_pred_single_block: u8 = pred_size_bits;
5817 while ci_pred_single_block < 9
5818 && ((1u32 << ci_pred_single_block) < width || (1u32 << ci_pred_single_block) < height)
5819 {
5820 ci_pred_single_block += 1;
5821 }
5822 let mut ci_pred_size_bits = vec![pred_size_bits];
5823 if ci_pred_single_block != pred_size_bits {
5824 ci_pred_size_bits.push(ci_pred_single_block);
5825 }
5826 // Round 305: sweep the predictor-sub-image strategy (L1 /
5827 // entropy / sub-image-aware) over the packed-index residual the
5828 // chain feeds the predictor. Non-regressing — kept only when
5829 // strictly smaller than the running best.
5830 for &sb in &ci_pred_size_bits {
5831 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5832 let mut got_candidate = false;
5833 let cand = select_best_cache_bits(|cache_bits| {
5834 match encode_with_color_indexing_predictor(
5835 pixels,
5836 width,
5837 height,
5838 sb,
5839 cache_bits,
5840 pred_strategy,
5841 ) {
5842 Some(bytes) => {
5843 got_candidate = true;
5844 bytes
5845 }
5846 // Packed image too small for this `size_bits`. Emit
5847 // a sentinel longer than the running best so the
5848 // cache sweep discards it; `got_candidate` stays
5849 // false and the outer comparison is skipped.
5850 None => vec![0u8; best.len() + 1],
5851 }
5852 });
5853 if got_candidate && cand.len() < best.len() {
5854 best = cand;
5855 }
5856 }
5857 }
5858 }
5859
5860 // Round 151: §6.2.2 multi-meta-prefix (entropy-image) candidate.
5861 // Sweeps a small set of `(prefix_bits, num_groups)` combinations,
5862 // each paired with the round-148 `cache_code_bits ∈ [1..11]` plus
5863 // disabled-cache baseline; whichever is smallest is compared
5864 // against the running `best`. The candidate is only built when
5865 // the image is large enough to contain `num_groups` blocks at the
5866 // current `prefix_bits` (the `encode_with_meta_prefix` helper
5867 // returns `None` otherwise). Multi-group encoding pays for itself
5868 // on images whose per-region statistics diverge (e.g. natural
5869 // images with sky-vs-foreground contrast, screenshots with
5870 // distinct UI regions) where separate per-region Huffman codes
5871 // shrink the LZ77 stream by more than the entropy-image +
5872 // additional code-length-table overhead.
5873 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
5874 if mp_best.len() < best.len() {
5875 best = mp_best;
5876 }
5877 }
5878
5879 best
5880}
5881
5882/// Sweep every `(prefix_bits, num_groups, cache_code_bits)` combination
5883/// the §6.2.2 multi-meta-prefix candidate admits and return the smallest
5884/// resulting stream, or `None` if no `(prefix_bits, num_groups)` pair
5885/// produced a non-degenerate stream (i.e. the image was too small for any
5886/// multi-block split, or every clustering collapsed to a single group).
5887fn sweep_meta_prefix_candidate(pixels: &[u32], width: u32, height: u32) -> Option<Vec<u8>> {
5888 let mut best: Option<Vec<u8>> = None;
5889 for &prefix_bits in META_PREFIX_BITS_SWEEP.iter() {
5890 for num_groups in 2..=MAX_META_GROUPS {
5891 // Per-(prefix_bits, num_groups), sweep the cache sizes;
5892 // some shapes are degenerate (None returned). Track the
5893 // best non-degenerate candidate.
5894 let mut shape_best: Option<Vec<u8>> = None;
5895 for cache_opt in
5896 std::iter::once(None).chain((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).map(Some))
5897 {
5898 if let Some(cand) = encode_with_meta_prefix(
5899 pixels,
5900 width,
5901 height,
5902 prefix_bits,
5903 num_groups,
5904 cache_opt,
5905 width,
5906 ) {
5907 match &shape_best {
5908 Some(s) if s.len() <= cand.len() => {}
5909 _ => shape_best = Some(cand),
5910 }
5911 }
5912 }
5913 if let Some(cand) = shape_best {
5914 match &best {
5915 Some(b) if b.len() <= cand.len() => {}
5916 _ => best = Some(cand),
5917 }
5918 }
5919 }
5920 }
5921 best
5922}
5923
5924/// Encode an ARGB image to a **bare** §2.6 / §3.4 `VP8L` bitstream — the
5925/// chunk payload (image-header + image stream), with **no** RIFF/WEBP
5926/// wrapper.
5927///
5928/// `pixels` is `width * height` ARGB values in scan-line order, each
5929/// `(alpha << 24) | (red << 16) | (green << 8) | blue`. The `alpha_is_used`
5930/// §3.4 header bit is auto-detected: it is set iff any pixel's alpha byte is
5931/// not `0xff`. Use [`encode_vp8l_argb_with`] to force the bit explicitly.
5932///
5933/// The output is the exact byte sequence
5934/// [`crate::vp8l_chunk::WebpLosslessChunk::bitstream`] returns for a framed
5935/// file — i.e. wrapping it in `build_chunk(fourcc::VP8L, ..)` (or
5936/// [`build::build_webp_file`] with [`ImageKind::Lossless`]) yields a complete
5937/// `.webp`. Encoding path matches [`encode_webp_lossless`]: no §3.8.2
5938/// transform, no §3.8.3 color cache, single meta-prefix code, literal-only.
5939pub fn encode_vp8l_argb(pixels: &[u32], width: u32, height: u32) -> Result<Vec<u8>, EncodeError> {
5940 let alpha_is_used = pixels.iter().any(|&p| (p >> 24) & 0xff != 0xff);
5941 encode_vp8l_argb_with(pixels, width, height, alpha_is_used)
5942}
5943
5944/// Encode an ARGB image to a bare §2.6 / §3.4 `VP8L` bitstream with the
5945/// §3.4 `alpha_is_used` header bit set **explicitly** by the caller.
5946///
5947/// Identical to [`encode_vp8l_argb`] but with a fixed (non-auto-detected)
5948/// `alpha_is_used`. A caller that already knows whether the image carries
5949/// alpha — e.g. one decoding the §2.7.1 `VP8X` `L` flag — avoids the
5950/// per-pixel scan. Setting `alpha_is_used = true` on a fully-opaque image is
5951/// permitted (a decoder reconstructs the same opaque pixels); setting it
5952/// `false` on an image with non-opaque pixels still round-trips because the
5953/// alpha values are carried in the §3.7.3 ARGB literals regardless of the
5954/// header bit.
5955pub fn encode_vp8l_argb_with(
5956 pixels: &[u32],
5957 width: u32,
5958 height: u32,
5959 alpha_is_used: bool,
5960) -> Result<Vec<u8>, EncodeError> {
5961 validate_argb(pixels, width, height)?;
5962 Ok(encode_vp8l_payload(pixels, width, height, alpha_is_used))
5963}
5964
5965#[cfg(test)]
5966mod tests {
5967 use super::*;
5968 use crate::vp8l_prefix::PrefixCode;
5969 use crate::vp8l_stream::BitReader;
5970
5971 // ---- BitWriter ----
5972
5973 #[test]
5974 fn bit_writer_round_trips_through_bit_reader() {
5975 let mut w = BitWriter::new();
5976 w.write_bits(0b101, 3);
5977 w.write_bits(0xABCD, 16);
5978 w.write_bit(true);
5979 let bytes = w.into_bytes();
5980 let mut r = BitReader::new(&bytes);
5981 assert_eq!(r.read_bits(3).unwrap(), 0b101);
5982 assert_eq!(r.read_bits(16).unwrap(), 0xABCD);
5983 assert!(r.read_bit().unwrap());
5984 }
5985
5986 // ---- canonical code construction ----
5987
5988 #[test]
5989 fn code_lengths_single_symbol_is_length_one() {
5990 let mut freq = vec![0u32; 8];
5991 freq[3] = 10;
5992 let lengths = build_code_lengths(&freq);
5993 assert_eq!(lengths[3], 1);
5994 assert_eq!(lengths.iter().filter(|&&l| l != 0).count(), 1);
5995 }
5996
5997 #[test]
5998 fn code_lengths_two_symbols_length_one_each() {
5999 let mut freq = vec![0u32; 4];
6000 freq[1] = 5;
6001 freq[2] = 5;
6002 let lengths = build_code_lengths(&freq);
6003 assert_eq!(lengths[1], 1);
6004 assert_eq!(lengths[2], 1);
6005 }
6006
6007 #[test]
6008 fn code_lengths_kraft_sum_is_one() {
6009 // A skewed distribution that produces varied lengths.
6010 let freq = vec![100u32, 1, 1, 1, 50, 25, 4, 2];
6011 let lengths = build_code_lengths(&freq);
6012 let mut k = 0f64;
6013 for &l in &lengths {
6014 if l > 0 {
6015 k += 2f64.powi(-(l as i32));
6016 }
6017 }
6018 assert!((k - 1.0).abs() < 1e-9, "Kraft sum {k} != 1");
6019 }
6020
6021 /// Round 303: the §3.7.2.1.2 code-length-code lengths are written in a
6022 /// 3-bit on-wire field, so they must never exceed 7. A skewed CLC
6023 /// frequency histogram (one length value far more common than the rest)
6024 /// drives the plain Huffman build to assign a length-8+ code to a rare
6025 /// CLC symbol; `build_clc_code_lengths` must re-balance it back under 8
6026 /// while keeping the table complete (Kraft sum exactly 1). Without the
6027 /// cap the 3-bit field silently truncated the over-long length to 0,
6028 /// corrupting the table into an incomplete code the decoder rejects.
6029 #[test]
6030 fn clc_code_lengths_capped_at_seven_and_complete() {
6031 // Histogram that drives the plain build past length 7: one
6032 // dominant length value plus a long tail of rare ones, exactly the
6033 // shape that produces a deep Huffman tree.
6034 let clc_freq: Vec<u32> = vec![
6035 1, 100_000, 1, 50_000, 25_000, 12_000, 6_000, 3_000, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
6036 ];
6037 // The plain build does produce an over-long (>7) code here, so the
6038 // cap path is genuinely exercised.
6039 let plain = build_code_lengths(&clc_freq);
6040 assert!(
6041 plain.iter().any(|&l| l as usize > MAX_CLC_CODE_LENGTH),
6042 "test premise: plain build must exceed the 3-bit CLC ceiling"
6043 );
6044
6045 let capped = build_clc_code_lengths(&clc_freq);
6046 assert!(
6047 capped.iter().all(|&l| l as usize <= MAX_CLC_CODE_LENGTH),
6048 "CLC lengths must all fit the 3-bit field: {capped:?}"
6049 );
6050 // Still a complete code (Kraft sum == 1 over the used symbols).
6051 let mut k = 0f64;
6052 for &l in &capped {
6053 if l > 0 {
6054 k += 2f64.powi(-(l as i32));
6055 }
6056 }
6057 assert!((k - 1.0).abs() < 1e-9, "capped CLC Kraft sum {k} != 1");
6058 }
6059
6060 #[test]
6061 fn built_code_decodes_through_prefix_reader() {
6062 // Build a code, emit symbols with it, and decode with the
6063 // round-104 reader to confirm bit-exact agreement.
6064 let freq = vec![40u32, 10, 5, 5, 1, 0, 0, 0];
6065 let code = WriteCode::from_freqs(&freq);
6066 let mut w = BitWriter::new();
6067 code.write_code_lengths(&mut w);
6068 // Emit symbols 0,1,2,3,4 in sequence.
6069 let seq = [0usize, 1, 2, 3, 4, 0, 0, 1];
6070 for &s in &seq {
6071 code.write_symbol(&mut w, s);
6072 }
6073 let bytes = w.into_bytes();
6074 let mut r = BitReader::new(&bytes);
6075 let decoded = PrefixCode::read(&mut r, freq.len()).unwrap();
6076 for &s in &seq {
6077 assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, s);
6078 }
6079 }
6080
6081 #[test]
6082 fn empty_distance_code_is_single_symbol_zero() {
6083 let code = WriteCode::empty(40);
6084 let mut w = BitWriter::new();
6085 code.write_code_lengths(&mut w);
6086 let bytes = w.into_bytes();
6087 let mut r = BitReader::new(&bytes);
6088 let decoded = PrefixCode::read(&mut r, 40).unwrap();
6089 assert_eq!(decoded.single_symbol(), Some(0));
6090 }
6091
6092 // ---- §3.7.2.1.1 simple code length code chooser ----
6093
6094 /// `WriteCode::as_simple_form` rejects any table that the simple form
6095 /// cannot represent verbatim: length > 1, symbol > 255, more than two
6096 /// used symbols, all-zeros table.
6097 #[test]
6098 fn simple_form_rejects_tables_outside_3_7_2_1_1_constraints() {
6099 // Three symbols → too many for simple form.
6100 let mut freq = vec![0u32; 8];
6101 freq[0] = 1;
6102 freq[1] = 1;
6103 freq[2] = 1;
6104 let three_sym = WriteCode::from_freqs(&freq);
6105 assert!(three_sym.as_simple_form().is_none());
6106
6107 // All-zero / empty alphabet → as_simple_form returns None
6108 // (encoder handles the empty case via `WriteCode::empty`).
6109 let lengths_empty = vec![0u8; 16];
6110 let codes_empty = canonical_codes(&lengths_empty);
6111 let empty_code = WriteCode {
6112 lengths: lengths_empty,
6113 codes: codes_empty,
6114 single: None,
6115 };
6116 assert!(empty_code.as_simple_form().is_none());
6117
6118 // Symbol > 255 → simple form's 8-bit symbol field can't carry it.
6119 let mut freq_big = vec![0u32; 300];
6120 freq_big[280] = 1;
6121 let beyond_255 = WriteCode::from_freqs(&freq_big);
6122 assert!(beyond_255.as_simple_form().is_none());
6123
6124 // Length > 1 → cannot be the simple form (every present symbol
6125 // must be at length 1).
6126 let mixed_lengths = vec![0u8, 2, 2, 1];
6127 let mixed_codes = canonical_codes(&mixed_lengths);
6128 let mixed = WriteCode {
6129 lengths: mixed_lengths,
6130 codes: mixed_codes,
6131 single: None,
6132 };
6133 assert!(mixed.as_simple_form().is_none());
6134 }
6135
6136 /// `WriteCode::as_simple_form` accepts the two qualifying shapes
6137 /// (1 used symbol or 2 used symbols, each at length 1).
6138 #[test]
6139 fn simple_form_accepts_one_or_two_length_one_symbols() {
6140 let mut freq1 = vec![0u32; 16];
6141 freq1[7] = 1;
6142 let one = WriteCode::from_freqs(&freq1);
6143 assert_eq!(one.as_simple_form(), Some(vec![7]));
6144
6145 let mut freq2 = vec![0u32; 16];
6146 freq2[3] = 4;
6147 freq2[12] = 4;
6148 let two = WriteCode::from_freqs(&freq2);
6149 assert_eq!(two.as_simple_form(), Some(vec![3, 12]));
6150 }
6151
6152 /// §3.7.2.1.1 exact bit-cost layout: 1 flag + 1 num + 1 width + s0 + s1.
6153 /// `simple_form_bits` must match the bytes [`write_simple_code_lengths`]
6154 /// actually emits.
6155 #[test]
6156 fn simple_form_bits_matches_written_layout() {
6157 // 1 symbol, symbol0 in [0..1] → is_first_8bits = 0 → 1-bit symbol.
6158 // Total = 1 + 1 + 1 + 1 = 4 bits.
6159 assert_eq!(simple_form_bits(&[1]), 4);
6160 // 1 symbol, symbol0 = 7 (> 1) → is_first_8bits = 1 → 8-bit symbol.
6161 // Total = 1 + 1 + 1 + 8 = 11 bits.
6162 assert_eq!(simple_form_bits(&[7]), 11);
6163 // 2 symbols, symbol0 = 0 (fits in 1 bit), symbol1 = 50.
6164 // Total = 1 + 1 + 1 + 1 + 8 = 12 bits.
6165 assert_eq!(simple_form_bits(&[0, 50]), 12);
6166 // 2 symbols, symbol0 = 200 (> 1) → 8 bits; symbol1 = 100 → 8 bits.
6167 // Total = 1 + 1 + 1 + 8 + 8 = 19 bits.
6168 assert_eq!(simple_form_bits(&[200, 100]), 19);
6169
6170 // Round-trip the byte count against an actual writer.
6171 let mut w = BitWriter::new();
6172 write_simple_code_lengths(&mut w, &[200, 100]);
6173 // 19 bits → 3 bytes (24 bits, padded). Confirm the writer's
6174 // bit-position is exactly 19.
6175 let pos_bits = w.bit_position();
6176 assert_eq!(pos_bits, 19);
6177 }
6178
6179 /// The chooser switches to the simple form for a 1-symbol distance
6180 /// code (saves ~14 bits over the normal-form single-leaf path).
6181 #[test]
6182 fn chooser_prefers_simple_form_for_empty_distance_code() {
6183 let code = WriteCode::empty(40);
6184 // Confirm normal form would have been more expensive than simple.
6185 let normal_bits = normal_form_bits(&code.lengths);
6186 let simple = code.as_simple_form().expect("empty(40) is simple-form");
6187 let simple_bits = simple_form_bits(&simple);
6188 assert!(
6189 simple_bits < normal_bits,
6190 "expected simple form (= {simple_bits} bits) to beat normal form (= {normal_bits} bits) for empty distance code"
6191 );
6192
6193 // Now drive write_code_lengths and confirm the leading flag bit is
6194 // 1 (the simple-form selector per §3.7.2.1).
6195 let mut w = BitWriter::new();
6196 code.write_code_lengths(&mut w);
6197 let bytes = w.into_bytes();
6198 let mut r = BitReader::new(&bytes);
6199 assert!(
6200 r.read_bit().expect("flag bit"),
6201 "chooser must select simple form (flag bit = 1) for the empty distance code"
6202 );
6203 }
6204
6205 /// `write_code_lengths` round-trips through the decoder for both
6206 /// branches of the chooser: a 1-symbol code (simple form) and a
6207 /// 4-symbol code (normal form).
6208 #[test]
6209 fn chooser_round_trips_through_decoder_on_both_branches() {
6210 // ---- 1-symbol path: simple form ----
6211 let mut freq = vec![0u32; 16];
6212 freq[9] = 7;
6213 let code1 = WriteCode::from_freqs(&freq);
6214 let mut w1 = BitWriter::new();
6215 code1.write_code_lengths(&mut w1);
6216 let bytes1 = w1.into_bytes();
6217 let mut r1 = BitReader::new(&bytes1);
6218 let decoded1 = PrefixCode::read(&mut r1, 16).expect("decode simple form");
6219 assert_eq!(
6220 decoded1.single_symbol(),
6221 Some(9),
6222 "decoder must recover the single-leaf symbol from the simple form"
6223 );
6224
6225 // ---- 4-symbol path: normal form ----
6226 let freq4 = vec![10u32, 4, 2, 1, 0, 0, 0, 0];
6227 let code4 = WriteCode::from_freqs(&freq4);
6228 let mut w4 = BitWriter::new();
6229 code4.write_code_lengths(&mut w4);
6230 // Emit a representative symbol sequence and round-trip it.
6231 let seq = [0usize, 1, 2, 3, 0, 0, 1, 2];
6232 for &s in &seq {
6233 code4.write_symbol(&mut w4, s);
6234 }
6235 let bytes4 = w4.into_bytes();
6236 let mut r4 = BitReader::new(&bytes4);
6237 let decoded4 = PrefixCode::read(&mut r4, 8).expect("decode normal form");
6238 for &s in &seq {
6239 assert_eq!(
6240 decoded4.read_symbol(&mut r4).expect("symbol") as usize,
6241 s,
6242 "round-trip mismatch on normal-form code"
6243 );
6244 }
6245 }
6246
6247 /// On a 1×1 opaque image the encoder produces 5 prefix codes
6248 /// (G/R/B/A + distance) and every one of them is the single-leaf
6249 /// case (one length-1 symbol, all others zero). Before round 149 the
6250 /// chooser had only the normal-form path, paying ≥ 58 bits per code
6251 /// to send the length table even though the per-symbol body
6252 /// collapses to zero. The simple-form path costs at most 11 bits
6253 /// (1-symbol header + 8-bit value), so the round-149 chooser flips
6254 /// all five codes and shrinks the encoded file by a large fraction
6255 /// on this baseline fixture.
6256 #[test]
6257 fn round_149_simple_form_shrinks_1x1_lossless_baseline() {
6258 let rgba = [0x12, 0x34, 0x56, 0xff];
6259 let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6260 eprintln!("round-149 1x1 lossless byte count: {}", file.len());
6261
6262 // Round-trip confirms the chosen stream still decodes.
6263 let decoded = crate::decode_webp(&file).unwrap();
6264 assert_eq!(decoded.frames[0].rgba, rgba);
6265
6266 // Round-148 baseline for this fixture was 174 bytes (5 prefix
6267 // codes × ≥ 58 bits each, plus container envelope). Round 149
6268 // lands at 32 bytes — a >80% reduction. Assert a conservative
6269 // strict-beat below the round-148 size.
6270 assert!(
6271 file.len() <= 48,
6272 "expected round-149 simple-form chooser to bring the 1×1 baseline well under the round-148 174-byte size; got {}",
6273 file.len()
6274 );
6275 }
6276
6277 /// Same chooser-shrink check on a 16×16 gradient. The chooser
6278 /// trade-off here applies to many of the candidate streams the
6279 /// super-chooser races: each pays substantially less header tax on
6280 /// its prefix codes when the alphabet collapses to one or two
6281 /// length-1 symbols (single-pixel column, alpha-uniform images,
6282 /// solid-color blocks, the bulk of small synthetic fixtures).
6283 #[test]
6284 fn round_149_simple_form_shrinks_synthetic_fixtures() {
6285 // 32×32 solid gray — every channel emits one literal value
6286 // repeated 1024 times. Each of the 4 literal prefix codes is a
6287 // single-leaf code → all four flip to the simple form.
6288 let mut solid = Vec::new();
6289 for _ in 0..1024 {
6290 solid.extend_from_slice(&[0x80, 0x80, 0x80, 0xff]);
6291 }
6292 let file_solid = encode_webp_lossless(&solid, 32, 32).unwrap();
6293 eprintln!("round-149 32×32 solid: {}", file_solid.len());
6294 assert!(
6295 file_solid.len() <= 100,
6296 "round-149 32×32 solid should land far below the round-148 174-byte size; got {}",
6297 file_solid.len()
6298 );
6299
6300 // 8×8 with 2 alpha values, single literal triple — RGB codes
6301 // single-leaf (one value each), alpha code two-symbol (0x80 and
6302 // 0xff). Two-symbol case may pick simple or normal depending on
6303 // the cost — the chooser picks whichever is cheaper.
6304 let mut alpha = Vec::new();
6305 for y in 0..8u32 {
6306 for x in 0..8u32 {
6307 let a = if (x + y) % 2 == 0 { 0xff } else { 0x80 };
6308 alpha.extend_from_slice(&[0x44, 0x88, 0xcc, a]);
6309 }
6310 }
6311 let file_alpha = encode_webp_lossless(&alpha, 8, 8).unwrap();
6312 eprintln!("round-149 8×8 alpha: {}", file_alpha.len());
6313 assert!(
6314 file_alpha.len() <= 110,
6315 "round-149 8×8 alpha should land below the round-148 178-byte size; got {}",
6316 file_alpha.len()
6317 );
6318
6319 // Every chosen stream still decodes byte-exact.
6320 let decoded_solid = crate::decode_webp(&file_solid).unwrap();
6321 assert_eq!(decoded_solid.frames[0].rgba, solid);
6322 let decoded_alpha = crate::decode_webp(&file_alpha).unwrap();
6323 assert_eq!(decoded_alpha.frames[0].rgba, alpha);
6324 }
6325
6326 /// Two-symbol simple-form path: when the alphabet has exactly two
6327 /// length-1 symbols, the chooser may pick simple (≤19 bits) or
6328 /// normal (≥18 bits) — whichever is cheaper. The chooser picks the
6329 /// minimum, and the chosen stream still decodes.
6330 #[test]
6331 fn round_149_two_symbol_simple_form_round_trips() {
6332 // Manually drive the chooser with a 2-symbol length-1 code.
6333 let mut freq = vec![0u32; 16];
6334 freq[2] = 5;
6335 freq[11] = 5;
6336 let code = WriteCode::from_freqs(&freq);
6337 assert_eq!(code.as_simple_form(), Some(vec![2, 11]));
6338
6339 // Confirm bit-costs are within ±1 bit of each other (the
6340 // chooser's interesting regime). Either choice round-trips.
6341 let normal_bits = normal_form_bits(&code.lengths);
6342 let simple_bits = simple_form_bits(&[2, 11]);
6343 eprintln!(
6344 "2-symbol code: simple={} bits, normal={} bits",
6345 simple_bits, normal_bits
6346 );
6347
6348 // Drive write_code_lengths through the chooser + decode.
6349 let mut w = BitWriter::new();
6350 code.write_code_lengths(&mut w);
6351 // Emit a few symbols to confirm the round-trip works.
6352 for _ in 0..3 {
6353 code.write_symbol(&mut w, 2);
6354 code.write_symbol(&mut w, 11);
6355 }
6356 let bytes = w.into_bytes();
6357 let mut r = BitReader::new(&bytes);
6358 let decoded = PrefixCode::read(&mut r, 16).expect("decode chooser output");
6359 for _ in 0..3 {
6360 assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, 2);
6361 assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, 11);
6362 }
6363 }
6364
6365 // ---- image-header ----
6366
6367 #[test]
6368 fn image_header_round_trips_through_chunk_peek() {
6369 use crate::vp8l_chunk::WebpLosslessChunk;
6370 let header = build_image_header(7, 5, true);
6371 // Append a dummy byte so the payload is long enough to peek.
6372 let mut payload = header.to_vec();
6373 payload.push(0);
6374 let h = WebpLosslessChunk::from_payload(&payload).unwrap();
6375 assert_eq!(h.width(), 7);
6376 assert_eq!(h.height(), 5);
6377 assert!(h.alpha_is_used());
6378 assert_eq!(h.version(), 0);
6379 }
6380
6381 // ---- end-to-end round trips ----
6382
6383 #[test]
6384 fn round_trip_1x1_opaque() {
6385 let rgba = [0x12, 0x34, 0x56, 0xff];
6386 let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6387 let decoded = crate::decode_webp(&file).unwrap();
6388 assert_eq!(decoded.frames[0].rgba, rgba);
6389 }
6390
6391 #[test]
6392 fn round_trip_1x1_with_alpha() {
6393 let rgba = [0xaa, 0xbb, 0xcc, 0x40];
6394 let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6395 let img = crate::decode_webp_image(&file).unwrap();
6396 assert_eq!(img.width, 1);
6397 assert_eq!(img.height, 1);
6398 assert_eq!(img.rgba, rgba);
6399 }
6400
6401 #[test]
6402 fn round_trip_small_gradient() {
6403 // 4x3 image with a spread of colors.
6404 let w = 4u32;
6405 let h = 3u32;
6406 let mut rgba = Vec::new();
6407 for y in 0..h {
6408 for x in 0..w {
6409 rgba.push((x * 60) as u8);
6410 rgba.push((y * 80) as u8);
6411 rgba.push(((x + y) * 30) as u8);
6412 rgba.push(0xff);
6413 }
6414 }
6415 let file = encode_webp_lossless(&rgba, w, h).unwrap();
6416 let decoded = crate::decode_webp(&file).unwrap();
6417 assert_eq!(decoded.frames[0].rgba, rgba);
6418 }
6419
6420 #[test]
6421 fn round_trip_solid_color_uses_single_leaf_codes() {
6422 // A solid color makes every channel a single-symbol code. The
6423 // round trip must still be exact.
6424 let w = 8u32;
6425 let h = 8u32;
6426 let mut rgba = Vec::new();
6427 for _ in 0..(w * h) {
6428 rgba.extend_from_slice(&[0x20, 0x40, 0x60, 0xff]);
6429 }
6430 let file = encode_webp_lossless(&rgba, w, h).unwrap();
6431 let decoded = crate::decode_webp(&file).unwrap();
6432 assert_eq!(decoded.frames[0].rgba, rgba);
6433 }
6434
6435 #[test]
6436 fn round_trip_larger_random_like() {
6437 // A deterministic pseudo-random pattern over a 16x16 RGBA image,
6438 // exercising all four channel codes with many distinct symbols.
6439 let w = 16u32;
6440 let h = 16u32;
6441 let mut rgba = Vec::new();
6442 let mut state = 0x1234_5678u32;
6443 for _ in 0..(w * h) {
6444 for _ in 0..4 {
6445 // xorshift
6446 state ^= state << 13;
6447 state ^= state >> 17;
6448 state ^= state << 5;
6449 rgba.push((state & 0xff) as u8);
6450 }
6451 }
6452 let file = encode_webp_lossless(&rgba, w, h).unwrap();
6453 let decoded = crate::decode_webp(&file).unwrap();
6454 assert_eq!(decoded.frames[0].rgba, rgba);
6455 }
6456
6457 #[test]
6458 fn encoded_file_walks_as_simple_lossless_container() {
6459 let rgba = [0x12, 0x34, 0x56, 0xff];
6460 let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6461 let c = crate::parse_container(&file).unwrap();
6462 assert!(c
6463 .first_chunk_with_fourcc(crate::container::fourcc::VP8L)
6464 .is_some());
6465 }
6466
6467 #[test]
6468 fn rejects_dimension_mismatch() {
6469 let rgba = [0u8; 4]; // 1 pixel
6470 match encode_webp_lossless(&rgba, 2, 2) {
6471 Err(EncodeError::PixelBufferMismatch { got, expected }) => {
6472 assert_eq!(got, 4);
6473 assert_eq!(expected, 16);
6474 }
6475 other => panic!("expected PixelBufferMismatch, got {other:?}"),
6476 }
6477 }
6478
6479 #[test]
6480 fn rejects_zero_dimensions() {
6481 match encode_webp_lossless(&[], 0, 0) {
6482 Err(EncodeError::InvalidDimensions { width, height }) => {
6483 assert_eq!(width, 0);
6484 assert_eq!(height, 0);
6485 }
6486 other => panic!("expected InvalidDimensions, got {other:?}"),
6487 }
6488 }
6489
6490 // ---- bare VP8L bitstream (encode_vp8l_argb / _with) ----
6491
6492 /// The bare bitstream wrapped in §2.6 framing equals the file
6493 /// [`encode_webp_lossless`] produces for the same pixels.
6494 #[test]
6495 fn bare_bitstream_wrapped_equals_framed_file() {
6496 // 3x2 ARGB image with a spread of colors and one non-opaque pixel.
6497 let pixels: [u32; 6] = [
6498 0xff10_2030,
6499 0xff40_5060,
6500 0x8070_8090,
6501 0xffa0_b0c0,
6502 0xffd0_e0f0,
6503 0xff00_1122,
6504 ];
6505 let bare = encode_vp8l_argb(&pixels, 3, 2).unwrap();
6506 let framed = build::build_webp_file(&bare, ImageKind::Lossless, 3, 2).unwrap();
6507
6508 // Re-derive the same file via the RGBA entry point.
6509 let mut rgba = Vec::new();
6510 for &p in &pixels {
6511 rgba.push((p >> 16) as u8);
6512 rgba.push((p >> 8) as u8);
6513 rgba.push(p as u8);
6514 rgba.push((p >> 24) as u8);
6515 }
6516 let via_rgba = encode_webp_lossless(&rgba, 3, 2).unwrap();
6517 assert_eq!(framed, via_rgba);
6518 }
6519
6520 /// A bare bitstream has no `RIFF` header — it begins with the §3.4
6521 /// `0x2F` VP8L signature byte.
6522 #[test]
6523 fn bare_bitstream_has_no_riff_wrapper() {
6524 let pixels = [0xff12_3456u32];
6525 let bare = encode_vp8l_argb(&pixels, 1, 1).unwrap();
6526 assert_ne!(&bare[0..4], b"RIFF");
6527 assert_eq!(bare[0], crate::vp8l_chunk::VP8L_SIGNATURE);
6528 }
6529
6530 /// `encode_vp8l_argb` auto-detects the §3.4 `alpha_is_used` bit.
6531 #[test]
6532 fn bare_bitstream_auto_detects_alpha() {
6533 let opaque = [0xff11_2233u32, 0xff44_5566];
6534 let bare = encode_vp8l_argb(&opaque, 2, 1).unwrap();
6535 let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6536 assert!(!h.alpha_is_used());
6537
6538 let translucent = [0x8011_2233u32, 0xff44_5566];
6539 let bare = encode_vp8l_argb(&translucent, 2, 1).unwrap();
6540 let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6541 assert!(h.alpha_is_used());
6542 }
6543
6544 /// `encode_vp8l_argb_with` forces the header bit regardless of pixels.
6545 #[test]
6546 fn bare_bitstream_with_forces_alpha_bit() {
6547 let opaque = [0xff11_2233u32];
6548 let bare = encode_vp8l_argb_with(&opaque, 1, 1, true).unwrap();
6549 let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6550 assert!(h.alpha_is_used());
6551 }
6552
6553 /// The bare bitstream round-trips back to the exact pixels through the
6554 /// full decode chain once framed.
6555 #[test]
6556 fn bare_bitstream_round_trips() {
6557 let pixels: [u32; 4] = [0x80aa_bbcc, 0xff00_1122, 0xc033_4455, 0xff66_7788];
6558 let bare = encode_vp8l_argb(&pixels, 2, 2).unwrap();
6559 let framed = build::build_webp_file(&bare, ImageKind::Lossless, 2, 2).unwrap();
6560 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6561 assert_eq!(img.pixels(), &pixels);
6562 }
6563
6564 #[test]
6565 fn bare_bitstream_rejects_dimension_mismatch() {
6566 let pixels = [0xff00_0000u32]; // 1 pixel
6567 match encode_vp8l_argb(&pixels, 2, 2) {
6568 Err(EncodeError::PixelBufferMismatch { got, expected }) => {
6569 assert_eq!(got, 4);
6570 assert_eq!(expected, 16);
6571 }
6572 other => panic!("expected PixelBufferMismatch, got {other:?}"),
6573 }
6574 }
6575
6576 // ---- §5.2.2 LZ77 prefix-value inverse ----
6577
6578 /// Every value `1..=4` maps to prefix code `value - 1` with no extra
6579 /// bits, matching the `< 4` decoder branch.
6580 #[test]
6581 fn value_to_prefix_small_values_have_no_extra_bits() {
6582 for v in 1u32..=4 {
6583 let (p, e, x) = value_to_prefix(v);
6584 assert_eq!(p, v - 1);
6585 assert_eq!(e, 0);
6586 assert_eq!(x, 0);
6587 }
6588 }
6589
6590 /// Round-trip every length value `1..=MAX_MATCH` through
6591 /// [`value_to_prefix`] back into the §5.2.2 decoder formula.
6592 #[test]
6593 fn value_to_prefix_round_trips_length_range() {
6594 for v in 1u32..=MAX_MATCH as u32 {
6595 let (p, e, x) = value_to_prefix(v);
6596 // Re-apply the §5.2.2 decoder formula.
6597 let recovered = if p < 4 {
6598 p + 1
6599 } else {
6600 let extra_bits = (p - 2) >> 1;
6601 let offset = (2 + (p & 1)) << extra_bits;
6602 assert_eq!(extra_bits, e);
6603 offset + x + 1
6604 };
6605 assert_eq!(recovered, v, "value_to_prefix lost value {v}");
6606 }
6607 }
6608
6609 /// Round-trip via the live decoder helper [`crate::vp8l_decode::read_lz77_value`]
6610 /// to confirm the encoder's split is bit-compatible with what the
6611 /// decoder actually executes.
6612 #[test]
6613 fn value_to_prefix_round_trips_through_decoder() {
6614 use crate::vp8l_decode::read_lz77_value;
6615 use crate::vp8l_stream::BitReader;
6616 // A spread of values across every prefix-code band.
6617 let samples = [
6618 1u32, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 16, 17, 24, 25, 32, 100, 1000, 4096,
6619 ];
6620 for &v in &samples {
6621 let (p, e, x) = value_to_prefix(v);
6622 let mut w = BitWriter::new();
6623 if e > 0 {
6624 w.write_bits(x, e as usize);
6625 }
6626 let data = w.into_bytes();
6627 let mut r = BitReader::new(&data);
6628 let got = read_lz77_value(&mut r, p).unwrap();
6629 assert_eq!(
6630 got, v,
6631 "value {v} → prefix {p}, extra ({e}b: {x:b}) decoded as {got}"
6632 );
6633 }
6634 }
6635
6636 // ---- §5.2.2 LZ77 matcher / encoder round-trips ----
6637
6638 /// A solid-color image's pixels are a single literal followed by one
6639 /// long copy that covers the rest. Round trip must be exact.
6640 #[test]
6641 fn round_trip_solid_color_uses_lz77_copy() {
6642 let w = 32u32;
6643 let h = 32u32;
6644 let pixels = vec![0xff20_4060u32; (w * h) as usize];
6645 let tokens = tokenize_lz77(&pixels);
6646 // 1 literal + ceil((1024 - 1) / 4096) copies; for 1024 pixels: 1 + 1.
6647 let copies = tokens
6648 .iter()
6649 .filter(|t| matches!(t, Token::Copy { .. }))
6650 .count();
6651 assert!(
6652 copies >= 1,
6653 "solid-color image should emit at least one copy"
6654 );
6655 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6656 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6657 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6658 assert_eq!(img.pixels(), pixels.as_slice());
6659 }
6660
6661 /// A repeated 4-pixel pattern (cycle length 4) compresses to a long
6662 /// copy with `distance = 4`, which the §5.2.2 overlap rule
6663 /// (`distance < length`) self-replicates correctly.
6664 #[test]
6665 fn round_trip_periodic_pattern_uses_overlapping_copy() {
6666 let pattern = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
6667 let w = 16u32;
6668 let h = 4u32;
6669 let mut pixels = Vec::with_capacity((w * h) as usize);
6670 for i in 0..(w * h) {
6671 pixels.push(pattern[(i % 4) as usize]);
6672 }
6673 let tokens = tokenize_lz77(&pixels);
6674 let copies: Vec<_> = tokens
6675 .iter()
6676 .filter_map(|t| match t {
6677 Token::Copy { length, distance } => Some((*length, *distance)),
6678 _ => None,
6679 })
6680 .collect();
6681 assert!(!copies.is_empty(), "periodic pattern should emit a copy");
6682 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6683 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6684 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6685 assert_eq!(img.pixels(), pixels.as_slice());
6686 }
6687
6688 /// The §5.2.2 LZ77 path produces a strictly smaller chunk than the
6689 /// literal-only baseline on a compressible (repetitive) image. This is
6690 /// the round-119 headline measurement.
6691 #[test]
6692 fn lz77_beats_literal_only_on_repetitive_image() {
6693 // 64x64 image whose first scan-line is a small palette of distinct
6694 // colors and the remaining 63 lines copy the first line verbatim.
6695 let w = 64u32;
6696 let h = 64u32;
6697 let mut pixels = Vec::with_capacity((w * h) as usize);
6698 let palette = [
6699 0xff10_2030u32,
6700 0xff40_5060,
6701 0xff70_8090,
6702 0xffa0_b0c0,
6703 0xffd0_e0f0,
6704 0xff00_1122,
6705 0xff33_4455,
6706 0xff66_7788,
6707 ];
6708 for x in 0..w {
6709 pixels.push(palette[(x as usize) % palette.len()]);
6710 }
6711 for _ in 1..h {
6712 for x in 0..w {
6713 pixels.push(palette[(x as usize) % palette.len()]);
6714 }
6715 }
6716 let lz77 = encode_argb_literals(&pixels);
6717 let lit_only = encode_argb_literals_only(&pixels);
6718 assert!(
6719 lz77.len() < lit_only.len(),
6720 "LZ77 stream ({} B) not smaller than literal-only ({} B)",
6721 lz77.len(),
6722 lit_only.len(),
6723 );
6724 // And, more strongly, at least a 50% reduction on this case.
6725 assert!(
6726 lz77.len() * 2 < lit_only.len(),
6727 "LZ77 stream ({} B) failed to halve literal-only ({} B)",
6728 lz77.len(),
6729 lit_only.len(),
6730 );
6731
6732 // Round trip is exact.
6733 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6734 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6735 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6736 assert_eq!(img.pixels(), pixels.as_slice());
6737 }
6738
6739 /// A pixel buffer with no exploitable repetition (deterministic
6740 /// xorshift) still round-trips through the LZ77 encoder — even when
6741 /// the matcher emits no copies and the distance code stays empty.
6742 #[test]
6743 fn lz77_round_trips_incompressible_pixels() {
6744 let w = 17u32;
6745 let h = 19u32;
6746 let mut pixels = Vec::with_capacity((w * h) as usize);
6747 let mut state = 0xdead_beefu32;
6748 for _ in 0..(w * h) {
6749 state ^= state << 13;
6750 state ^= state >> 17;
6751 state ^= state << 5;
6752 pixels.push(state);
6753 }
6754 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6755 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6756 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6757 assert_eq!(img.pixels(), pixels.as_slice());
6758 }
6759
6760 // ---- §3.5.3 / §3.8.2 subtract-green forward transform ----
6761
6762 /// `apply_subtract_green` is the per-pixel inverse of
6763 /// [`crate::vp8l_transform::inverse_subtract_green`]: subtracting
6764 /// then re-adding green restores the originals, even across the
6765 /// `& 0xff` wrap.
6766 #[test]
6767 fn apply_subtract_green_is_inverse_of_inverse_subtract_green() {
6768 let mut pixels = [
6769 0xff00_0000u32, // black
6770 0xff7f_ff00, // greenish
6771 0xffff_ffff, // white
6772 0x8012_3456, // mid alpha
6773 0x0001_0203, // wrapping case: r=01, g=02, b=03
6774 ];
6775 let original = pixels;
6776 apply_subtract_green(&mut pixels);
6777 // Run the decoder's inverse and confirm we're back at the start.
6778 crate::vp8l_transform::inverse_subtract_green(&mut pixels);
6779 assert_eq!(pixels, original);
6780 }
6781
6782 /// `apply_subtract_green` preserves the green and alpha channels and
6783 /// only mutates red/blue per the §3.5.3 spec.
6784 #[test]
6785 fn apply_subtract_green_only_touches_red_and_blue() {
6786 let mut pixels = [0x80_70_60_50u32]; // a=80 r=70 g=60 b=50
6787 apply_subtract_green(&mut pixels);
6788 // a, g unchanged; r := (0x70 - 0x60) & 0xff = 0x10; b := 0xf0.
6789 assert_eq!((pixels[0] >> 24) & 0xff, 0x80);
6790 assert_eq!((pixels[0] >> 16) & 0xff, 0x10);
6791 assert_eq!((pixels[0] >> 8) & 0xff, 0x60);
6792 assert_eq!(pixels[0] & 0xff, 0xf0); // 0x50 - 0x60 = -0x10 → 0xf0
6793 }
6794
6795 /// On a synthetic natural-image-like fixture (a gradient where red and
6796 /// blue track green), the subtract-green path is strictly smaller than
6797 /// the no-transform path. This is the round-120 headline measurement.
6798 #[test]
6799 fn subtract_green_beats_no_transform_on_green_correlated_image() {
6800 // 32x32 image whose r and b channels each closely track g, so
6801 // (r - g) and (b - g) cluster tightly around 0 — exactly the
6802 // distribution §3.5.3 is designed to exploit.
6803 let w = 32u32;
6804 let h = 32u32;
6805 let mut pixels = Vec::with_capacity((w * h) as usize);
6806 let mut state = 0xC0FFEE12u32;
6807 for _ in 0..(w * h) {
6808 // xorshift-driven green; r/b are green plus small noise.
6809 state ^= state << 13;
6810 state ^= state >> 17;
6811 state ^= state << 5;
6812 let g = state & 0xff;
6813 let r = g.wrapping_add(((state >> 8) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6814 let b = g.wrapping_add(((state >> 16) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6815 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
6816 }
6817 let no_tx = {
6818 let tokens = tokenize_lz77(&pixels);
6819 // Width-less baseline (matches `encode_argb_literals_subtract_green`
6820 // below, which also uses width=1) so the comparison isolates
6821 // the subtract-green transform from the round-130 distance-map
6822 // chooser.
6823 encode_tokens(&tokens, false, None, 1)
6824 };
6825 let sg = encode_argb_literals_subtract_green(&pixels);
6826 eprintln!(
6827 "[round-120] 32x32 green-correlated: no-tx={} B, subtract-green={} B ({:.1}% reduction)",
6828 no_tx.len(),
6829 sg.len(),
6830 100.0 * (no_tx.len() as f64 - sg.len() as f64) / no_tx.len() as f64,
6831 );
6832 assert!(
6833 sg.len() < no_tx.len(),
6834 "subtract-green ({} B) did not beat no-transform ({} B)",
6835 sg.len(),
6836 no_tx.len(),
6837 );
6838
6839 // Round trip through the full decode chain stays pixel-exact.
6840 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6841 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6842 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6843 assert_eq!(img.pixels(), pixels.as_slice());
6844 }
6845
6846 /// `encode_argb_literals` picks the smallest of the four
6847 /// `(no-tx | sg) × (no-cache | cache)` paths it evaluates, so on
6848 /// any image its output equals the minimum of all four candidate
6849 /// streams.
6850 #[test]
6851 fn encode_argb_literals_chooses_smaller_path() {
6852 let w = 32u32;
6853 let h = 32u32;
6854 let mut pixels = Vec::with_capacity((w * h) as usize);
6855 // A solid green tint with slight per-pixel red/blue noise — the
6856 // subtract-green path concentrates r and b near zero.
6857 let mut state = 0x12345678u32;
6858 for _ in 0..(w * h) {
6859 state ^= state << 13;
6860 state ^= state >> 17;
6861 state ^= state << 5;
6862 let g = 0x80u32;
6863 let r = g.wrapping_add((state & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6864 let b = g.wrapping_add(((state >> 4) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6865 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
6866 }
6867 let chosen = encode_argb_literals(&pixels);
6868 // `encode_argb_literals` defaults to width=1 (no distance-map
6869 // optimisation); match it for the per-option comparison.
6870 let no_tx = encode_literals_with_options(&pixels, false, None, 1);
6871 let sg = encode_literals_with_options(&pixels, true, None, 1);
6872 let cc = encode_literals_with_options(&pixels, false, Some(DEFAULT_COLOR_CACHE_BITS), 1);
6873 let sg_cc = encode_literals_with_options(&pixels, true, Some(DEFAULT_COLOR_CACHE_BITS), 1);
6874 let best = no_tx.len().min(sg.len()).min(cc.len()).min(sg_cc.len());
6875 assert_eq!(chosen.len(), best);
6876 }
6877
6878 /// A subtract-green-encoded image survives a full encode → decode
6879 /// round trip via the public entry points: the encoder writes the
6880 /// §3.8.2 transform header, the decoder reads it back and applies the
6881 /// §4.3 inverse, restoring the originals.
6882 #[test]
6883 fn subtract_green_path_round_trips_via_public_entry_points() {
6884 let w = 8u32;
6885 let h = 8u32;
6886 let pixels: Vec<u32> = (0..(w * h))
6887 .map(|i| {
6888 let g = (i * 4) & 0xff;
6889 let r = g.wrapping_add(3) & 0xff;
6890 let b = g.wrapping_sub(2) & 0xff;
6891 0xff00_0000 | (r << 16) | (g << 8) | b
6892 })
6893 .collect();
6894 // Force the subtract-green path via the test-only entry.
6895 let stream = encode_argb_literals_subtract_green(&pixels);
6896 let header = build_image_header(w, h, false);
6897 let mut payload = header.to_vec();
6898 payload.extend_from_slice(&stream);
6899 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
6900 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6901 assert_eq!(img.pixels(), pixels.as_slice());
6902 }
6903
6904 /// On a pure-noise image (no green correlation) the chooser falls
6905 /// back to the no-transform path — `encode_argb_literals` should
6906 /// never produce a stream larger than the literal-only baseline by
6907 /// applying a transform that doesn't help.
6908 #[test]
6909 fn encode_argb_literals_does_not_regress_on_uncorrelated_noise() {
6910 let w = 16u32;
6911 let h = 16u32;
6912 let mut pixels = Vec::with_capacity((w * h) as usize);
6913 let mut state = 0xDEAD_BEEFu32;
6914 for _ in 0..(w * h) {
6915 state ^= state << 13;
6916 state ^= state >> 17;
6917 state ^= state << 5;
6918 pixels.push(state | 0xff00_0000);
6919 }
6920 let chosen = encode_argb_literals(&pixels);
6921 let no_tx = {
6922 let tokens = tokenize_lz77(&pixels);
6923 // Match `encode_argb_literals`'s width-less form (width=1) so
6924 // the chooser comparison stays apples-to-apples regardless of
6925 // the round-130 distance-map optimisation.
6926 encode_tokens(&tokens, false, None, 1)
6927 };
6928 assert!(
6929 chosen.len() <= no_tx.len(),
6930 "chooser regressed: {} B with chooser vs {} B no-transform",
6931 chosen.len(),
6932 no_tx.len(),
6933 );
6934 }
6935
6936 /// A maximum-length copy (>= MAX_MATCH pixels of identical color) is
6937 /// split into consecutive §5.2.2 copies, each bounded by `MAX_MATCH`.
6938 #[test]
6939 fn round_trip_splits_match_at_max_length() {
6940 // A solid-color image with `> MAX_MATCH` pixels: the first row
6941 // is the literal source, subsequent rows are copies.
6942 let total = MAX_MATCH + 100;
6943 let pixels = vec![0xff80_8080u32; total];
6944 let tokens = tokenize_lz77(&pixels);
6945 for tok in &tokens {
6946 if let Token::Copy { length, .. } = tok {
6947 assert!(
6948 *length <= MAX_MATCH,
6949 "copy length {length} exceeded MAX_MATCH"
6950 );
6951 }
6952 }
6953 // Round trip via the full encoder/decoder chain (1-row image of
6954 // `total` pixels).
6955 let w = total as u32;
6956 let h = 1u32;
6957 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6958 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6959 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6960 assert_eq!(img.pixels(), pixels.as_slice());
6961 }
6962
6963 // ---- §5.2.1 / §5.2.3 color cache (round 121) ----
6964
6965 /// The encoder's `EncoderColorCache` uses the spec's §5.2.3 hash
6966 /// formula and matches the decoder's
6967 /// [`crate::vp8l_decode::ColorCache::hash`] bit-for-bit at every
6968 /// allowed `code_bits`.
6969 #[test]
6970 fn encoder_color_cache_hash_matches_decoder_hash() {
6971 use crate::vp8l_decode::ColorCache;
6972 for bits in COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX {
6973 let enc = EncoderColorCache::new(bits);
6974 let dec = ColorCache::new(bits);
6975 // A spread of synthetic ARGB pixels: black, white, the
6976 // wrap-around 0x01020304, a saturated red, a mid-alpha
6977 // greenish, plus a zero (which all caches start with).
6978 for argb in [
6979 0x0000_0000u32,
6980 0xffff_ffff,
6981 0x0102_0304,
6982 0xffff_0000,
6983 0x8000_ff80,
6984 0x1234_5678,
6985 ] {
6986 assert_eq!(
6987 enc.hash(argb),
6988 dec.hash(argb),
6989 "hash mismatch at code_bits={bits} for argb=0x{argb:08x}"
6990 );
6991 }
6992 assert_eq!(enc.size(), 1 << bits);
6993 }
6994 }
6995
6996 /// A fresh cache holds zeros, so `contains(0)` succeeds *before*
6997 /// any insertion — exactly the §5.2.3 "all entries set to zero"
6998 /// invariant the decoder relies on.
6999 #[test]
7000 fn encoder_color_cache_starts_zero_initialized() {
7001 let cache = EncoderColorCache::new(4);
7002 // Index 0's slot starts at the all-zero pixel.
7003 let zero_idx = cache.hash(0);
7004 assert_eq!(cache.entries[zero_idx], 0);
7005 assert_eq!(cache.contains(0), Some(zero_idx));
7006 }
7007
7008 /// Inserting a pixel makes a subsequent `contains` for that same
7009 /// pixel resolve to the matching slot; an unrelated pixel does
7010 /// not collide (with overwhelming probability at 8 cache bits).
7011 #[test]
7012 fn encoder_color_cache_insert_then_contains_round_trips() {
7013 let mut cache = EncoderColorCache::new(8);
7014 let argb = 0xff12_3456u32;
7015 assert!(cache.contains(argb).is_none() || cache.entries[cache.hash(argb)] != argb);
7016 cache.insert(argb);
7017 assert_eq!(cache.contains(argb), Some(cache.hash(argb)));
7018 }
7019
7020 /// `cacheify_tokens` converts a literal back-to-back repeat into
7021 /// a `CacheRef` token whose `index` matches the cache slot, while
7022 /// leaving the first (unique) literal as a literal.
7023 #[test]
7024 fn cacheify_tokens_collapses_repeat_literal_into_cache_ref() {
7025 let argb = 0xff20_4060u32;
7026 let pixels = vec![argb, argb];
7027 let raw = vec![Token::Literal(argb), Token::Literal(argb)];
7028 let out = cacheify_tokens(&raw, &pixels, 8);
7029 assert!(matches!(out[0], Token::Literal(p) if p == argb));
7030 let cache = EncoderColorCache::new(8);
7031 let idx = cache.hash(argb) as u32;
7032 assert_eq!(out[1], Token::CacheRef { index: idx });
7033 }
7034
7035 /// A backward-reference `Copy` token inserts each copied pixel
7036 /// into the cache, so a subsequent literal that hashes to the
7037 /// same slot is collapsed to a `CacheRef`.
7038 #[test]
7039 fn cacheify_tokens_copy_updates_cache_for_subsequent_literal() {
7040 let argb = 0xff80_4010u32;
7041 // pixels: [argb, argb, argb, argb] — represented as a literal
7042 // followed by a Copy {length: 3, distance: 1}, then later
7043 // (at position 4) we add the same argb as a literal again.
7044 let pixels = vec![argb, argb, argb, argb, argb];
7045 let raw = vec![
7046 Token::Literal(argb),
7047 Token::Copy {
7048 length: 3,
7049 distance: 1,
7050 },
7051 Token::Literal(argb),
7052 ];
7053 let out = cacheify_tokens(&raw, &pixels, 8);
7054 // The first literal is still a literal; the copy passes
7055 // through; the trailing literal is now a CacheRef.
7056 assert!(matches!(out[0], Token::Literal(p) if p == argb));
7057 assert!(matches!(
7058 out[1],
7059 Token::Copy {
7060 length: 3,
7061 distance: 1,
7062 }
7063 ));
7064 let cache = EncoderColorCache::new(8);
7065 let idx = cache.hash(argb) as u32;
7066 assert_eq!(out[2], Token::CacheRef { index: idx });
7067 }
7068
7069 /// Forcing the color-cache path on a repetitive 16-color palette
7070 /// fixture round-trips bit-exactly through the decoder. This is
7071 /// the headline round-121 sanity test: the encoder emits §5.2.3
7072 /// cache codes; the decoder reads them back via its own
7073 /// [`crate::vp8l_decode::ColorCache`] and reconstructs the same
7074 /// pixels.
7075 #[test]
7076 fn color_cache_path_round_trips_via_public_entry_points() {
7077 let w = 8u32;
7078 let h = 8u32;
7079 // 16 distinct ARGB colors cycling per scan-line; every color
7080 // appears multiple times so the cache gets exercised.
7081 let palette: [u32; 16] = [
7082 0xff00_0000,
7083 0xff00_00ff,
7084 0xff00_ff00,
7085 0xff00_ffff,
7086 0xffff_0000,
7087 0xffff_00ff,
7088 0xffff_ff00,
7089 0xffff_ffff,
7090 0xff80_8080,
7091 0xff20_4060,
7092 0xff60_4020,
7093 0xff10_2030,
7094 0xff30_2010,
7095 0xffa0_b0c0,
7096 0xffc0_b0a0,
7097 0xff55_aa55,
7098 ];
7099 let pixels: Vec<u32> = (0..(w * h))
7100 .map(|i| palette[(i as usize) % palette.len()])
7101 .collect();
7102 // Force the color-cache path via the test-only entry.
7103 let stream = encode_argb_literals_color_cache(&pixels, DEFAULT_COLOR_CACHE_BITS);
7104 let header = build_image_header(w, h, false);
7105 let mut payload = header.to_vec();
7106 payload.extend_from_slice(&stream);
7107 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
7108 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7109 assert_eq!(img.pixels(), pixels.as_slice());
7110 }
7111
7112 /// On a small palette of repeated colors (a synthetic but
7113 /// realistic case for palette-heavy artwork), the §5.2.3
7114 /// color-cache path produces a smaller stream than the
7115 /// no-cache LZ77 path. This is the round-121 headline
7116 /// measurement.
7117 #[test]
7118 fn color_cache_beats_no_cache_on_small_palette_image() {
7119 // 32x32 image where every pixel is drawn from an 8-color
7120 // palette, in a pseudo-random pattern (so the LZ77 matcher
7121 // can't collapse them all into long copies and the
7122 // color-cache codes get to do real work).
7123 let w = 32u32;
7124 let h = 32u32;
7125 let palette: [u32; 8] = [
7126 0xff10_2030,
7127 0xff40_5060,
7128 0xff70_8090,
7129 0xffa0_b0c0,
7130 0xffd0_e0f0,
7131 0xff00_1122,
7132 0xff33_4455,
7133 0xff66_7788,
7134 ];
7135 let mut pixels = Vec::with_capacity((w * h) as usize);
7136 let mut state = 0x1357_9bdfu32;
7137 for _ in 0..(w * h) {
7138 state ^= state << 13;
7139 state ^= state >> 17;
7140 state ^= state << 5;
7141 pixels.push(palette[(state as usize) % palette.len()]);
7142 }
7143 // Width-less form (matches `encode_argb_literals_color_cache`,
7144 // which also uses width=1) so the comparison isolates the
7145 // color-cache effect from the round-130 distance-map chooser.
7146 let no_cache = encode_literals_with_options(&pixels, false, None, 1);
7147 let cache = encode_literals_with_options(&pixels, false, Some(DEFAULT_COLOR_CACHE_BITS), 1);
7148 eprintln!(
7149 "[round-121] 32x32 small-palette pseudo-random: no-cache={} B, color-cache={} B ({:.1}% reduction)",
7150 no_cache.len(),
7151 cache.len(),
7152 100.0 * (no_cache.len() as f64 - cache.len() as f64) / no_cache.len() as f64,
7153 );
7154 assert!(
7155 cache.len() < no_cache.len(),
7156 "color-cache stream ({} B) did not beat no-cache LZ77 ({} B)",
7157 cache.len(),
7158 no_cache.len(),
7159 );
7160
7161 // Round trip through the full encoder/decoder chain is exact.
7162 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7163 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7164 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7165 assert_eq!(img.pixels(), pixels.as_slice());
7166 }
7167
7168 /// On a noisy image with effectively-zero color repetition the
7169 /// chooser never selects the cache path (it would just inflate
7170 /// the GREEN alphabet for no compression gain), so
7171 /// `encode_argb_literals` never produces a stream larger than the
7172 /// no-cache baseline on uncorrelated noise.
7173 #[test]
7174 fn color_cache_chooser_does_not_regress_on_uncorrelated_noise() {
7175 let w = 16u32;
7176 let h = 16u32;
7177 let mut pixels = Vec::with_capacity((w * h) as usize);
7178 let mut state = 0xfeed_b00bu32;
7179 for _ in 0..(w * h) {
7180 state ^= state << 13;
7181 state ^= state >> 17;
7182 state ^= state << 5;
7183 pixels.push(state | 0xff00_0000);
7184 }
7185 let chosen = encode_argb_literals(&pixels);
7186 // Match `encode_argb_literals`'s width=1 form so the comparison
7187 // is apples-to-apples.
7188 let no_cache_no_tx = encode_literals_with_options(&pixels, false, None, 1);
7189 assert!(
7190 chosen.len() <= no_cache_no_tx.len(),
7191 "chooser regressed on noise: {} B chosen vs {} B no-cache no-tx",
7192 chosen.len(),
7193 no_cache_no_tx.len(),
7194 );
7195 }
7196
7197 /// The §5.2.3 `color-cache-info` header field encodes the
7198 /// chosen `code_bits` value: when the cache is enabled, the
7199 /// decoder reads `%b1` followed by `ReadBits(4) = code_bits`,
7200 /// and the `ColorCacheInfo::is_enabled()` flag flips on. This
7201 /// test routes the encoded stream through the live decoder's
7202 /// `MetaPrefixHeader::read` and confirms it sees the cache.
7203 #[test]
7204 fn color_cache_header_round_trips_through_meta_prefix_reader() {
7205 use crate::meta_prefix::{ImageRole, MetaPrefixHeader};
7206 use crate::vp8l_stream::BitReader;
7207 let w = 4u32;
7208 let h = 4u32;
7209 let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
7210 let pixels: Vec<u32> = (0..(w * h))
7211 .map(|i| palette[(i as usize) % palette.len()])
7212 .collect();
7213 let stream = encode_argb_literals_color_cache(&pixels, DEFAULT_COLOR_CACHE_BITS);
7214 // Read straight off the image-stream — no §3.8.2 transform
7215 // header is present (we forced the no-tx path), so the
7216 // very first bit is the transform-list terminator `%b0`,
7217 // followed by the §3.8.3 `color-cache-info`.
7218 let mut r = BitReader::new(&stream);
7219 // Skip the transform-list terminator.
7220 assert!(!r.read_bit().unwrap());
7221 let header = MetaPrefixHeader::read(&mut r, ImageRole::Argb, w, h).unwrap();
7222 assert!(header.color_cache.is_enabled());
7223 assert_eq!(header.color_cache.code_bits, DEFAULT_COLOR_CACHE_BITS);
7224 assert_eq!(header.color_cache.size(), 1 << DEFAULT_COLOR_CACHE_BITS);
7225 }
7226
7227 // ---- round 130: §5.2.2 distance-map chooser ----
7228
7229 /// `pixel_distance_to_distance_code` reconstructs the spec's
7230 /// `xi + yi * W` for the chosen code, identical to the decoder.
7231 /// Across every distance-map entry at a fixed width, the chooser
7232 /// must pick a code that round-trips through
7233 /// `distance_code_to_pixel_distance` to the original distance.
7234 #[test]
7235 fn distance_chooser_reconstructs_each_distance_map_entry() {
7236 use crate::vp8l_decode::{distance_code_to_pixel_distance, DISTANCE_MAP};
7237 let width = 256u32;
7238 for &(xi, yi) in DISTANCE_MAP.iter() {
7239 let raw = xi + yi * width as i32;
7240 let d = if raw < 1 { 1 } else { raw as usize };
7241 let code = pixel_distance_to_distance_code(d, width);
7242 assert_eq!(
7243 distance_code_to_pixel_distance(code, width),
7244 d,
7245 "chooser code {code} for d={d} (xi={xi},yi={yi}) does not round-trip",
7246 );
7247 }
7248 }
7249
7250 /// The smallest-code early-out must produce byte-for-byte the same
7251 /// code as a full no-early-out linear scan that tracks the minimum
7252 /// matching code. The reference below re-implements the round-119
7253 /// full-scan-with-tie-break (start at the scan-line code, visit every
7254 /// one of the 120 entries, keep the smallest matching code); the
7255 /// production [`pixel_distance_to_distance_code`] returns on the first
7256 /// match. Across a representative distance range and several widths,
7257 /// both must agree on every input.
7258 #[test]
7259 fn distance_chooser_early_out_matches_full_scan() {
7260 use crate::vp8l_decode::{DISTANCE_MAP, NUM_DISTANCE_MAP_CODES};
7261
7262 // Full no-early-out linear scan with smallest-code tie-break —
7263 // the behaviour the early-out replaces. Bit-exactness against the
7264 // production chooser is what this test pins.
7265 fn full_scan(distance: usize, image_width: u32) -> u32 {
7266 let scan_line_code = distance as u32 + NUM_DISTANCE_MAP_CODES as u32;
7267 let mut best = scan_line_code;
7268 let width_i32 = image_width as i32;
7269 for (idx, &(xi, yi)) in DISTANCE_MAP.iter().enumerate() {
7270 let raw = xi + yi * width_i32;
7271 let mapped = if raw < 1 { 1 } else { raw as usize };
7272 if mapped == distance {
7273 let candidate = (idx + 1) as u32;
7274 if candidate < best {
7275 best = candidate;
7276 }
7277 }
7278 }
7279 best
7280 }
7281
7282 // Widths spanning width-1 (no spatial structure), narrow, typical
7283 // tile, and a wide row so the clamp-to-1 and large-distance
7284 // regimes are all exercised.
7285 for &width in &[1u32, 2, 16, 128, 256, 1024] {
7286 // Distance 1..=400 covers every clamp-to-1 hit, every
7287 // single-row / multi-row map distance for these widths, and
7288 // a long tail that has no map representation (scan-line
7289 // fallback). Plus a few large distances past any map reach.
7290 for distance in (1usize..=400).chain([1000, 4096, 70_000]) {
7291 assert_eq!(
7292 pixel_distance_to_distance_code(distance, width),
7293 full_scan(distance, width),
7294 "early-out diverged from full scan at distance={distance} width={width}",
7295 );
7296 }
7297 }
7298 }
7299
7300 /// For a 256-wide image, pixel distance 256 (one row above) must be
7301 /// represented by distance-map code 1 ((0, 1)), not the scan-line
7302 /// code 376 (`256 + 120`). This is the headline round-130 win on
7303 /// natural images.
7304 #[test]
7305 fn distance_chooser_picks_map_code_for_row_distance() {
7306 let width = 256u32;
7307 let code = pixel_distance_to_distance_code(width as usize, width);
7308 assert_eq!(code, 1, "row distance must collapse to map code 1");
7309 // And legacy scan-line code is the bigger alternative.
7310 assert_eq!(distance_to_code(width as usize), width + 120);
7311 }
7312
7313 /// A distance with no §5.2.2 map representation at the chosen width
7314 /// falls back to the scan-line code `D + 120`. At width 256, a
7315 /// distance of 1000 has no `(xi, yi)` entry that reconstructs it, so
7316 /// the chooser emits `1000 + 120 = 1120`.
7317 #[test]
7318 fn distance_chooser_falls_back_to_scan_line_when_no_map_match() {
7319 let width = 256u32;
7320 let code = pixel_distance_to_distance_code(1000, width);
7321 assert_eq!(code, 1000 + 120);
7322 }
7323
7324 /// Width-1 (the no-spatial-structure form) admits no distance-map
7325 /// entry whose `xi + yi*1` exceeds 8+7 = 15, so any distance >= 16
7326 /// must use the scan-line form. The chooser must agree.
7327 #[test]
7328 fn distance_chooser_width_one_uses_scan_line_for_large_distances() {
7329 for d in [16usize, 32, 64, 100, 500] {
7330 assert_eq!(
7331 pixel_distance_to_distance_code(d, 1),
7332 (d as u32) + 120,
7333 "width=1 distance {d} should not collapse",
7334 );
7335 }
7336 }
7337
7338 /// On a row-correlated image (every scan-line copies the row above
7339 /// verbatim), the round-130 width-aware encoder must produce a
7340 /// strictly smaller stream than the round-119 scan-line-only form.
7341 /// This is the headline round-130 size-reduction measurement.
7342 #[test]
7343 fn width_aware_distance_beats_scan_line_only_on_row_correlated_image() {
7344 // 128x128 image whose every row is a fresh pseudo-random
7345 // 128-pixel pattern repeated for the next scan-line. The LZ77
7346 // matcher emits a single `Copy { length: ~MAX_MATCH, distance:
7347 // 128 }` per row (and chains thereafter). At width 128, distance
7348 // 128 = `(0, 1)` = distance-map code 1, far smaller than the
7349 // scan-line code 248.
7350 let w = 128u32;
7351 let h = 128u32;
7352 let mut pixels = Vec::with_capacity((w * h) as usize);
7353 let mut state = 0xC0DE_FACEu32;
7354 for _ in 0..w {
7355 state ^= state << 13;
7356 state ^= state >> 17;
7357 state ^= state << 5;
7358 pixels.push((state & 0x00ff_ffff) | 0xff00_0000);
7359 }
7360 for y in 1..h {
7361 for x in 0..w {
7362 pixels.push(pixels[(x + (y - 1) * w) as usize]);
7363 }
7364 }
7365
7366 let width_aware = encode_argb_literals_with_width(&pixels, w);
7367 let scan_line_only = encode_argb_literals(&pixels); // width=1
7368
7369 eprintln!(
7370 "[round-130] 128x128 row-correlated: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7371 scan_line_only.len(),
7372 width_aware.len(),
7373 100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7374 / scan_line_only.len() as f64,
7375 );
7376 assert!(
7377 width_aware.len() < scan_line_only.len(),
7378 "width-aware stream ({} B) not smaller than scan-line-only ({} B)",
7379 width_aware.len(),
7380 scan_line_only.len(),
7381 );
7382
7383 // Round trip is exact via the public entry point.
7384 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7385 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7386 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7387 assert_eq!(img.pixels(), pixels.as_slice());
7388 }
7389
7390 /// A photo-like fixture (smooth luma gradient + per-pixel small
7391 /// noise to fill the LZ77 hash chains) gets the round-130 chooser
7392 /// to find numerous small `(xi, yi)` matches in the §5.2.2
7393 /// distance-map neighbourhood. Compared to the width=1 scan-line
7394 /// baseline, the width-aware path is strictly smaller.
7395 #[test]
7396 fn width_aware_distance_beats_scan_line_only_on_photo_like_image() {
7397 let w = 64u32;
7398 let h = 64u32;
7399 let mut pixels = Vec::with_capacity((w * h) as usize);
7400 // Each row is a low-amplitude noise pattern around a luma ramp;
7401 // adjacent rows share the same noise seed but with a tiny offset,
7402 // so 2-D neighbour matches are abundant.
7403 let mut state = 0x1234_5678u32;
7404 for y in 0..h {
7405 let luma = (y * 4) as u8;
7406 for _x in 0..w {
7407 state ^= state << 13;
7408 state ^= state >> 17;
7409 state ^= state << 5;
7410 let n = (state & 0x07) as i32 - 3; // [-3, 4)
7411 let g = (luma as i32 + n).clamp(0, 255) as u32;
7412 let r = g;
7413 let b = g;
7414 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
7415 }
7416 }
7417 let width_aware = encode_argb_literals_with_width(&pixels, w);
7418 let scan_line_only = encode_argb_literals(&pixels);
7419 eprintln!(
7420 "[round-130] 64x64 photo-like: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7421 scan_line_only.len(),
7422 width_aware.len(),
7423 100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7424 / scan_line_only.len() as f64,
7425 );
7426 assert!(
7427 width_aware.len() <= scan_line_only.len(),
7428 "width-aware regressed: {} B vs scan-line-only {} B",
7429 width_aware.len(),
7430 scan_line_only.len(),
7431 );
7432
7433 // Round trip stays exact.
7434 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7435 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7436 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7437 assert_eq!(img.pixels(), pixels.as_slice());
7438 }
7439
7440 /// Round trip is exact across a spread of image widths. The chooser
7441 /// must never emit a distance code that reconstructs to a different
7442 /// pixel distance on the decode side.
7443 #[test]
7444 fn width_aware_round_trip_across_assorted_widths() {
7445 for &(w, h) in &[
7446 (1u32, 16u32),
7447 (3u32, 16u32),
7448 (16u32, 16u32),
7449 (97u32, 13u32),
7450 (200u32, 3u32),
7451 (256u32, 8u32),
7452 ] {
7453 let mut pixels = Vec::with_capacity((w * h) as usize);
7454 // A row-repeating pattern so the LZ77 matcher emits copies
7455 // at row-multiple distances, exercising the chooser.
7456 for y in 0..h {
7457 for x in 0..w {
7458 let v = (x.wrapping_mul(31).wrapping_add(y)) & 0xff;
7459 pixels.push(0xff00_0000 | (v << 16) | (v << 8) | v);
7460 }
7461 }
7462 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7463 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7464 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7465 assert_eq!(
7466 img.pixels(),
7467 pixels.as_slice(),
7468 "round trip mismatch at {w}x{h}",
7469 );
7470 }
7471 }
7472
7473 /// A 64x64 image whose every row is row 0 shifted by `(y % 4) - 1`
7474 /// pixels — the resulting per-row matches are short (3-pixel-aligned
7475 /// hashes mostly), at distances clustered near `width = 64`. The
7476 /// matcher emits many small Copy tokens whose distances are 60–65
7477 /// (= 64-4..64+1), all of which the round-130 chooser collapses to
7478 /// distance-map codes 1, 3, 4 (prefix 0–2). With dozens of emissions
7479 /// the chooser's per-token saving compounds against the scan-line
7480 /// baseline (which would assign each to prefix-14 buckets).
7481 #[test]
7482 fn width_aware_distance_compounds_on_many_short_row_offset_matches() {
7483 let w = 64u32;
7484 let h = 64u32;
7485 let mut row0 = Vec::with_capacity(w as usize);
7486 let mut state = 0x1357_2468u32;
7487 for _ in 0..w {
7488 state ^= state << 13;
7489 state ^= state >> 17;
7490 state ^= state << 5;
7491 row0.push((state & 0x00ff_ffff) | 0xff00_0000);
7492 }
7493 let mut pixels = Vec::with_capacity((w * h) as usize);
7494 pixels.extend_from_slice(&row0);
7495 for y in 1..h {
7496 // Per-row 0..3 horizontal shift, ringing back into row0.
7497 let shift = (y as usize) & 0x3;
7498 for x in 0..(w as usize) {
7499 pixels.push(row0[(x + shift) % (w as usize)]);
7500 }
7501 }
7502 let width_aware = encode_argb_literals_with_width(&pixels, w);
7503 let scan_line_only = encode_argb_literals(&pixels);
7504 eprintln!(
7505 "[round-130] 64x64 row-shifted: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7506 scan_line_only.len(),
7507 width_aware.len(),
7508 100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7509 / scan_line_only.len() as f64,
7510 );
7511 assert!(
7512 width_aware.len() < scan_line_only.len(),
7513 "width-aware ({} B) not smaller than scan-line-only ({} B)",
7514 width_aware.len(),
7515 scan_line_only.len(),
7516 );
7517
7518 // Round trip stays exact via the production path.
7519 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7520 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7521 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7522 assert_eq!(img.pixels(), pixels.as_slice());
7523 }
7524
7525 /// A 256x256 row-repeating image (every scan-line a copy of row 1)
7526 /// drives the round-130 chooser to swap the scan-line code `256+120
7527 /// = 376` (prefix 16, 7 extra bits) for the map code 1 (prefix 0,
7528 /// 0 extra bits) — the largest single-emission saving the chooser
7529 /// can produce. The aggregate stream-size delta is the round-130
7530 /// headline measurement on row-correlated content.
7531 #[test]
7532 fn width_aware_distance_headline_256x256_row_repeating() {
7533 let w = 256u32;
7534 let h = 256u32;
7535 let mut pixels = Vec::with_capacity((w * h) as usize);
7536 let mut state = 0xABCD_1234u32;
7537 for _ in 0..w {
7538 state ^= state << 13;
7539 state ^= state >> 17;
7540 state ^= state << 5;
7541 pixels.push((state & 0x00ff_ffff) | 0xff00_0000);
7542 }
7543 for y in 1..h {
7544 for x in 0..w {
7545 pixels.push(pixels[(x + (y - 1) * w) as usize]);
7546 }
7547 }
7548
7549 let width_aware = encode_argb_literals_with_width(&pixels, w);
7550 let scan_line_only = encode_argb_literals(&pixels);
7551 eprintln!(
7552 "[round-130] 256x256 row-repeating: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7553 scan_line_only.len(),
7554 width_aware.len(),
7555 100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7556 / scan_line_only.len() as f64,
7557 );
7558 assert!(
7559 width_aware.len() < scan_line_only.len(),
7560 "width-aware stream ({} B) not smaller than scan-line-only ({} B)",
7561 width_aware.len(),
7562 scan_line_only.len(),
7563 );
7564
7565 // Round trip stays exact via the production path.
7566 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7567 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7568 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7569 assert_eq!(img.pixels(), pixels.as_slice());
7570 }
7571
7572 /// Re-encode an existing lossless fixture (decoded to ARGB) through
7573 /// both the width=1 scan-line-only form and the round-130 width-aware
7574 /// form, and confirm the width-aware variant is strictly smaller and
7575 /// round-trips bit-exactly. This exercises the chooser on
7576 /// non-synthetic distance distributions (the fixture's encoder
7577 /// produced whatever natural-image-style matches it found).
7578 #[test]
7579 fn width_aware_re_encode_of_real_fixture_is_smaller() {
7580 // 32x32 RGBA fixture committed in-tree (no external decode).
7581 let bytes: &[u8] = include_bytes!("../tests/data/lossless-32x32-rgba.webp");
7582 let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
7583 let w = decoded.width();
7584 let h = decoded.height();
7585 let pixels = decoded.pixels().to_vec();
7586
7587 let width_aware = encode_argb_literals_with_width(&pixels, w);
7588 let scan_line_only = encode_argb_literals(&pixels);
7589 eprintln!(
7590 "[round-130] {}x{} re-encoded fixture: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7591 w,
7592 h,
7593 scan_line_only.len(),
7594 width_aware.len(),
7595 100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7596 / scan_line_only.len() as f64,
7597 );
7598 assert!(
7599 width_aware.len() <= scan_line_only.len(),
7600 "width-aware regressed: {} B vs scan-line-only {} B",
7601 width_aware.len(),
7602 scan_line_only.len(),
7603 );
7604
7605 // Round trip through the encoder + decoder is exact.
7606 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7607 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7608 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7609 assert_eq!(img.pixels(), pixels.as_slice());
7610 }
7611
7612 /// The chooser must never inflate a distance: the chosen code's
7613 /// prefix code is always less than or equal to the scan-line
7614 /// alternative's prefix code, since the chooser picks the smaller
7615 /// raw code and `value_to_prefix` is monotonic in the value.
7616 #[test]
7617 fn chooser_never_picks_larger_prefix_than_scan_line() {
7618 let width = 320u32;
7619 for d in 1..=(width as usize * 4) {
7620 let chooser_code = pixel_distance_to_distance_code(d, width);
7621 let scan_code = distance_to_code(d);
7622 let (chooser_prefix, _, _) = value_to_prefix(chooser_code);
7623 let (scan_prefix, _, _) = value_to_prefix(scan_code);
7624 assert!(
7625 chooser_prefix <= scan_prefix,
7626 "d={d}: chooser code {chooser_code} (prefix {chooser_prefix}) > scan-line {scan_code} (prefix {scan_prefix})",
7627 );
7628 }
7629 }
7630
7631 // ---- round 146: §4.1 spatial-predictor forward transform ----
7632
7633 /// Round-224 cross-check (kept after the SWAR-form regression
7634 /// finding documented on the function itself): the public
7635 /// `predictor_subtract` body must remain bit-identical to the
7636 /// per-channel `wrapping_sub` semantics. Sweep 1 024 deterministic
7637 /// LCG `(original, pred)` pairs plus six hand-picked boundary
7638 /// pairs (every-channel underflow, every-channel positive,
7639 /// all-zero, all-0xff, mixed) against a verbatim copy of the
7640 /// closure-of-four reference. Acts as a regression guard so any
7641 /// future re-attempt at a SWAR / `std::simd` rewrite of this
7642 /// function can re-use this test to pin the new body against the
7643 /// reference semantics.
7644 #[test]
7645 fn predictor_subtract_matches_per_byte_reference_random() {
7646 // Verbatim copy of the closure-of-four reference body. The
7647 // published function must be bit-identical to this for every
7648 // input — this is a cross-check, not a baseline measurement.
7649 fn reference(original: u32, pred: u32) -> u32 {
7650 let a = ((original >> 24) & 0xff).wrapping_sub((pred >> 24) & 0xff) & 0xff;
7651 let r = ((original >> 16) & 0xff).wrapping_sub((pred >> 16) & 0xff) & 0xff;
7652 let g = ((original >> 8) & 0xff).wrapping_sub((pred >> 8) & 0xff) & 0xff;
7653 let b = (original & 0xff).wrapping_sub(pred & 0xff) & 0xff;
7654 (a << 24) | (r << 16) | (g << 8) | b
7655 }
7656 // Boundary cases: every-channel underflow, no-underflow, mixed.
7657 for &(orig, pred) in &[
7658 (0x0000_0000u32, 0x0000_0000u32),
7659 (0xffff_ffffu32, 0xffff_ffffu32),
7660 (0x0000_0000u32, 0xffff_ffffu32), // every channel underflows
7661 (0xffff_ffffu32, 0x0000_0000u32), // every channel saturates positive
7662 (0x10_20_30_40u32, 0x05_30_20_50u32), // mixed: r,b underflow; a,g positive
7663 (0x80_80_80_80u32, 0x80_80_80_80u32), // zero residual
7664 ] {
7665 assert_eq!(
7666 predictor_subtract(orig, pred),
7667 reference(orig, pred),
7668 "predictor_subtract diverges from per-byte reference at \
7669 orig=0x{orig:08x} pred=0x{pred:08x}"
7670 );
7671 }
7672 let mut seed: u32 = 0xcafe_d00d;
7673 let mut rng = || {
7674 seed = seed.wrapping_mul(1_103_515_245).wrapping_add(12_345);
7675 seed
7676 };
7677 for _ in 0..1_024 {
7678 let orig = rng();
7679 let pred = rng();
7680 assert_eq!(
7681 predictor_subtract(orig, pred),
7682 reference(orig, pred),
7683 "predictor_subtract diverges from per-byte reference at \
7684 orig=0x{orig:08x} pred=0x{pred:08x}"
7685 );
7686 }
7687 }
7688
7689 /// Round-280 cross-check for the mode-specialised block-residual
7690 /// walker: `block_mode_cost`, `block_mode_entropy_cost`, and the
7691 /// capped walks driving `pick_block_mode_with_hint` /
7692 /// `pick_block_mode_with_hint_slack` must stay bit-identical to a
7693 /// verbatim copy of the pre-round-280 per-pixel `predictor_at`
7694 /// loops. Sweeps deterministic-LCG images over shapes covering
7695 /// every walker boundary regime — 1×N / N×1 (border-only rows and
7696 /// columns), 2×2, blocks overlapping the right and bottom edges,
7697 /// blocks larger than the image, interior blocks not touching any
7698 /// border — for every mode `0..=13` plus an out-of-range mode,
7699 /// and pins the hinted pickers (whose row-granular prune must be
7700 /// pick-identical to the reference per-pixel early-out) for every
7701 /// `prefer_mode` and a slack sweep.
7702 #[test]
7703 fn block_walker_matches_predictor_at_reference_random() {
7704 // Verbatim pre-round-280 `block_mode_cost` body.
7705 #[allow(clippy::too_many_arguments)]
7706 fn ref_cost(
7707 pixels: &[u32],
7708 width: usize,
7709 height: usize,
7710 x0: usize,
7711 y0: usize,
7712 bw: usize,
7713 bh: usize,
7714 mode: u8,
7715 ) -> u64 {
7716 let mut cost: u64 = 0;
7717 for dy in 0..bh {
7718 let y = y0 + dy;
7719 if y >= height {
7720 break;
7721 }
7722 for dx in 0..bw {
7723 let x = x0 + dx;
7724 if x >= width {
7725 break;
7726 }
7727 let pred = predictor_at(pixels, width, x, y, mode);
7728 let original = pixels[y * width + x];
7729 let residual = predictor_subtract(original, pred);
7730 cost += residual_magnitude(residual) as u64;
7731 }
7732 }
7733 cost
7734 }
7735 // Verbatim pre-round-280 `block_mode_entropy_cost` histogram
7736 // fill (the Shannon sum over it is unchanged, so comparing
7737 // the histograms pins the whole function).
7738 #[allow(clippy::too_many_arguments)]
7739 fn ref_hist(
7740 pixels: &[u32],
7741 width: usize,
7742 height: usize,
7743 x0: usize,
7744 y0: usize,
7745 bw: usize,
7746 bh: usize,
7747 mode: u8,
7748 ) -> ([[u32; 256]; 4], u32) {
7749 let mut hist: [[u32; 256]; 4] = [[0u32; 256]; 4];
7750 let mut n: u32 = 0;
7751 for dy in 0..bh {
7752 let y = y0 + dy;
7753 if y >= height {
7754 break;
7755 }
7756 for dx in 0..bw {
7757 let x = x0 + dx;
7758 if x >= width {
7759 break;
7760 }
7761 let pred = predictor_at(pixels, width, x, y, mode);
7762 let original = pixels[y * width + x];
7763 let residual = predictor_subtract(original, pred);
7764 hist[0][((residual >> 24) & 0xff) as usize] += 1;
7765 hist[1][((residual >> 16) & 0xff) as usize] += 1;
7766 hist[2][((residual >> 8) & 0xff) as usize] += 1;
7767 hist[3][(residual & 0xff) as usize] += 1;
7768 n += 1;
7769 }
7770 }
7771 (hist, n)
7772 }
7773 let mut seed: u32 = 0x2b80_c0de;
7774 let mut rng = move || {
7775 seed = seed.wrapping_mul(1_103_515_245).wrapping_add(12_345);
7776 seed
7777 };
7778 // (width, height, x0, y0, bw, bh) — every walker regime.
7779 let shapes: &[(usize, usize, usize, usize, usize, usize)] = &[
7780 (1, 1, 0, 0, 4, 4), // single pixel, block larger than image
7781 (1, 9, 0, 0, 4, 4), // single column (left-column rule only)
7782 (1, 9, 0, 8, 4, 4), // single column, partial bottom block
7783 (9, 1, 0, 0, 4, 4), // single row (top-row rule only)
7784 (9, 1, 4, 0, 4, 4), // single row, interior-start block
7785 (2, 2, 0, 0, 2, 2), // smallest full-rules image
7786 (8, 8, 0, 0, 8, 8), // block == image (all four borders)
7787 (8, 8, 4, 4, 4, 4), // bottom-right block (TR wraparound)
7788 (8, 8, 4, 0, 4, 4), // top-right block (top row + wraparound)
7789 (8, 8, 0, 4, 4, 4), // bottom-left block (left column)
7790 (11, 7, 8, 4, 4, 4), // overlaps right and bottom edges
7791 (16, 16, 4, 4, 4, 4), // pure interior block (no borders)
7792 (5, 5, 0, 0, 16, 16), // block much larger than image
7793 ];
7794 for &(width, height, x0, y0, bw, bh) in shapes {
7795 let pixels: Vec<u32> = (0..width * height).map(|_| rng()).collect();
7796 // Cost + histogram equivalence for every mode, including
7797 // one §4.1-undefined mode (predicts solid black).
7798 for mode in 0u8..=14 {
7799 assert_eq!(
7800 block_mode_cost(&pixels, width, height, x0, y0, bw, bh, mode),
7801 ref_cost(&pixels, width, height, x0, y0, bw, bh, mode),
7802 "block_mode_cost diverges at {width}x{height} block \
7803 ({x0},{y0},{bw},{bh}) mode {mode}"
7804 );
7805 let mut sink = ResidualHistogramSink {
7806 hist: [[0u32; 256]; 4],
7807 n: 0,
7808 };
7809 for_each_block_residual(&pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
7810 let (hist, n) = ref_hist(&pixels, width, height, x0, y0, bw, bh, mode);
7811 assert_eq!(
7812 (sink.hist, sink.n),
7813 (hist, n),
7814 "residual histogram diverges at {width}x{height} block \
7815 ({x0},{y0},{bw},{bh}) mode {mode}"
7816 );
7817 }
7818 // Pick equivalence: the row-granular capped walk must
7819 // select the same mode as a reference full-cost argmin
7820 // (lowest mode wins ties) for every hint, and the slack
7821 // variant for a slack sweep.
7822 let mut ref_best_mode: u8 = 0;
7823 let mut ref_best_cost = u64::MAX;
7824 for mode in 0u8..=13 {
7825 let cost = ref_cost(&pixels, width, height, x0, y0, bw, bh, mode);
7826 if cost < ref_best_cost {
7827 ref_best_cost = cost;
7828 ref_best_mode = mode;
7829 }
7830 }
7831 for hint in std::iter::once(None).chain((0u8..=13).map(Some)) {
7832 let mut want = ref_best_mode;
7833 if let Some(m) = hint {
7834 if m != want
7835 && ref_cost(&pixels, width, height, x0, y0, bw, bh, m) == ref_best_cost
7836 {
7837 want = m;
7838 }
7839 }
7840 assert_eq!(
7841 pick_block_mode_with_hint(&pixels, width, height, x0, y0, bw, bh, hint),
7842 want,
7843 "hinted pick diverges at {width}x{height} block \
7844 ({x0},{y0},{bw},{bh}) hint {hint:?}"
7845 );
7846 for slack in [0u64, 1, 7, 64] {
7847 let mut want_slack = ref_best_mode;
7848 if let Some(m) = hint {
7849 if m != want_slack
7850 && ref_cost(&pixels, width, height, x0, y0, bw, bh, m)
7851 <= ref_best_cost.saturating_add(slack)
7852 {
7853 want_slack = m;
7854 }
7855 }
7856 assert_eq!(
7857 pick_block_mode_with_hint_slack(
7858 &pixels, width, height, x0, y0, bw, bh, hint, slack
7859 ),
7860 want_slack,
7861 "slack pick diverges at {width}x{height} block \
7862 ({x0},{y0},{bw},{bh}) hint {hint:?} slack {slack}"
7863 );
7864 }
7865 }
7866 }
7867 }
7868
7869 /// `predictor_subtract` is the per-channel mod-256 inverse of the
7870 /// decoder's `add_pred`: re-adding the same prediction recovers
7871 /// the original, regardless of which channels wrap.
7872 #[test]
7873 fn predictor_subtract_is_inverse_of_add() {
7874 let cases = [
7875 (0xff00_0000u32, 0xff00_0000u32),
7876 (0x1234_5678u32, 0x0000_0000u32),
7877 (0xff80_4020u32, 0x8040_2010u32),
7878 (0x0000_ff00u32, 0xff00_ff00u32),
7879 ];
7880 for (orig, pred) in cases {
7881 let residual = predictor_subtract(orig, pred);
7882 // Reconstruct via add_pred semantics: per-channel
7883 // wrapping_add must restore the original.
7884 let a = ((residual >> 24) & 0xff).wrapping_add((pred >> 24) & 0xff) & 0xff;
7885 let r = ((residual >> 16) & 0xff).wrapping_add((pred >> 16) & 0xff) & 0xff;
7886 let g = ((residual >> 8) & 0xff).wrapping_add((pred >> 8) & 0xff) & 0xff;
7887 let b = (residual & 0xff).wrapping_add(pred & 0xff) & 0xff;
7888 let rebuilt = (a << 24) | (r << 16) | (g << 8) | b;
7889 assert_eq!(
7890 rebuilt, orig,
7891 "subtract+add did not round-trip for orig=0x{orig:08x} pred=0x{pred:08x}"
7892 );
7893 }
7894 }
7895
7896 /// On a solid block, mode 1 (L) and mode 2 (T) both predict the
7897 /// neighbour exactly → zero residual on every channel for every
7898 /// interior pixel. `pick_block_mode` returns the lowest such
7899 /// mode by tie-breaking convention; either 0 (border-only block)
7900 /// or 1 is acceptable for the top-left block of a solid image.
7901 #[test]
7902 fn pick_block_mode_zero_cost_on_solid_block() {
7903 let w = 8usize;
7904 let h = 8usize;
7905 let pixels = vec![0xff50_6070u32; w * h];
7906 // Block covering rows 1..8, cols 1..8 — all interior except
7907 // the strip at x=0 / y=0, but those are clamped out by the
7908 // edge rules in `predictor_at`.
7909 let mode = pick_block_mode(&pixels, w, h, 0, 0, w, h);
7910 // Any mode that uses an immediate neighbour (1=L, 2=T, etc.)
7911 // produces zero residual on a constant image, so the cost
7912 // is zero; with the tie-breaker, the lowest mode wins. Mode
7913 // 0 (solid black) only matches when the image *is* solid
7914 // black — here the constant is grey, so mode 0 costs more
7915 // than 1/2/.../13, and one of those wins.
7916 assert!(mode <= 13, "mode out of range: {mode}");
7917 // Sanity: residual under the picked mode must indeed be
7918 // zero everywhere (the top-left predicts 0xff000000 → cost
7919 // 0x60 + 0x70 + 0x50 = 0xe0 fold worth, but interior pixels
7920 // dominate — total cost ≪ what mode 0 produces).
7921 let mode_cost = |m: u8| -> u64 {
7922 let mut c = 0u64;
7923 for y in 0..h {
7924 for x in 0..w {
7925 let pred = predictor_at(&pixels, w, x, y, m);
7926 let r = predictor_subtract(pixels[y * w + x], pred);
7927 c += residual_magnitude(r) as u64;
7928 }
7929 }
7930 c
7931 };
7932 let picked_cost = mode_cost(mode);
7933 let mode0_cost = mode_cost(0);
7934 assert!(
7935 picked_cost < mode0_cost,
7936 "expected picked-mode cost ({picked_cost}) < mode-0 cost ({mode0_cost})"
7937 );
7938 }
7939
7940 /// Forward + inverse predictor round-trips bit-exact: applying
7941 /// the encoder's forward transform then the decoder's inverse
7942 /// transform recovers the original pixels.
7943 #[test]
7944 fn forward_predictor_round_trips_through_decoder_inverse() {
7945 use crate::vp8l_transform::inverse_predictor;
7946 let w = 16u32;
7947 let h = 16u32;
7948 // Smooth gradient — mode 7 (Average2(L, T)) should predict
7949 // most pixels well.
7950 let mut pixels = Vec::with_capacity((w * h) as usize);
7951 for y in 0..h {
7952 for x in 0..w {
7953 let r = x * 16;
7954 let g = y * 16;
7955 let b = (x + y) * 8;
7956 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
7957 }
7958 }
7959 let size_bits = 4u8; // 16x16 blocks → tw=th=1.
7960 let (pred_img, tw, _th) = build_predictor_image(&pixels, w, h, size_bits);
7961 let mut residuals = vec![0u32; pixels.len()];
7962 apply_forward_predictor(&pixels, &mut residuals, w, h, &pred_img, tw, size_bits);
7963 // Apply the decoder's inverse pass and confirm we recover
7964 // the originals.
7965 inverse_predictor(&mut residuals, w, h, &pred_img, tw, size_bits);
7966 assert_eq!(residuals, pixels);
7967 }
7968
7969 /// End-to-end: encode + decode via the public `encode_webp_lossless`
7970 /// path round-trips a smooth-gradient image bit-exactly. The
7971 /// chooser is free to pick the predictor candidate or not; the
7972 /// round-trip property must hold for *whatever* path it picks.
7973 #[test]
7974 fn round_trip_smooth_gradient_with_predictor_candidate() {
7975 let w = 32u32;
7976 let h = 32u32;
7977 let mut rgba = Vec::with_capacity((w * h * 4) as usize);
7978 for y in 0..h {
7979 for x in 0..w {
7980 rgba.push((x * 8) as u8); // r
7981 rgba.push((y * 8) as u8); // g
7982 rgba.push(((x + y) * 4) as u8); // b
7983 rgba.push(0xff); // a
7984 }
7985 }
7986 let file = encode_webp_lossless(&rgba, w, h).unwrap();
7987 let decoded = crate::decode_webp(&file).unwrap();
7988 assert_eq!(decoded.frames[0].rgba, rgba);
7989 }
7990
7991 /// On a smooth gradient the §4.1 predictor candidate should
7992 /// produce a smaller stream than the no-transform / subtract-
7993 /// green baseline: per-pixel residuals concentrate near zero,
7994 /// shrinking the green/red/blue Huffman codes. The chooser
7995 /// must select the predictor (or another equally-good
7996 /// candidate), so the final stream size is at most the
7997 /// no-tx baseline.
7998 #[test]
7999 fn predictor_path_shrinks_smooth_gradient() {
8000 let w = 64u32;
8001 let h = 64u32;
8002 let mut pixels = Vec::with_capacity((w * h) as usize);
8003 for y in 0..h {
8004 for x in 0..w {
8005 let r = (x * 4) & 0xff;
8006 let g = (y * 4) & 0xff;
8007 let b = ((x + y) * 2) & 0xff;
8008 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8009 }
8010 }
8011 // No-tx + no-cache baseline (the round-119 path).
8012 let baseline = encode_literals_with_options(&pixels, false, None, w);
8013 // The full chooser (which now includes the predictor path).
8014 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8015 eprintln!(
8016 "[round-146] {}x{} smooth gradient: no-tx baseline={} B, chooser={} B ({:.1}% reduction)",
8017 w,
8018 h,
8019 baseline.len(),
8020 chosen.len(),
8021 100.0 * (baseline.len() as f64 - chosen.len() as f64) / baseline.len() as f64,
8022 );
8023 assert!(
8024 chosen.len() <= baseline.len(),
8025 "chooser regressed on smooth gradient: {} B vs no-tx baseline {} B",
8026 chosen.len(),
8027 baseline.len(),
8028 );
8029
8030 // Round trip through the full encoder/decoder is exact.
8031 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8032 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8033 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8034 assert_eq!(img.pixels(), pixels.as_slice());
8035 }
8036
8037 /// On uncorrelated random noise the predictor never helps (no
8038 /// neighbour predicts the next pixel any better than random),
8039 /// so the chooser stays on the no-tx no-cache path (or
8040 /// subtract-green if that happens to win). The final stream
8041 /// must not regress vs the no-predictor chooser.
8042 #[test]
8043 fn predictor_chooser_does_not_regress_on_noise() {
8044 let w = 32u32;
8045 let h = 32u32;
8046 let mut pixels = Vec::with_capacity((w * h) as usize);
8047 let mut state = 0xc0ff_eeeeu32;
8048 for _ in 0..(w * h) {
8049 state ^= state << 13;
8050 state ^= state >> 17;
8051 state ^= state << 5;
8052 pixels.push(state | 0xff00_0000);
8053 }
8054 let no_predictor = encode_argb_literals_with_width(&pixels, w);
8055 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8056 assert!(
8057 chosen.len() <= no_predictor.len(),
8058 "predictor chooser regressed on noise: {} B vs {} B",
8059 chosen.len(),
8060 no_predictor.len(),
8061 );
8062 }
8063
8064 /// Round-trip the published `lossless-128x128-natural` fixture:
8065 /// decode it, re-encode via the full predictor-aware chooser,
8066 /// decode again. The decoded pixels must match the originals
8067 /// bit-exactly, and the re-encoded stream size should
8068 /// demonstrate the predictor path is being exercised on a
8069 /// natural image (we don't assert a specific size, only
8070 /// log it).
8071 #[test]
8072 fn natural_fixture_round_trips_through_predictor_aware_encoder() {
8073 let bytes: &[u8] = include_bytes!("../tests/data/lossless-128x128-natural.webp");
8074 let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
8075 let w = decoded.width();
8076 let h = decoded.height();
8077 let pixels = decoded.pixels().to_vec();
8078
8079 let pre_predictor = encode_argb_literals_with_width(&pixels, w);
8080 let with_predictor = encode_argb_with_predictor_chooser(&pixels, w, h);
8081 eprintln!(
8082 "[round-146] {}x{} natural fixture re-encoded: pre-predictor chooser={} B, predictor chooser={} B ({:.1}% reduction)",
8083 w,
8084 h,
8085 pre_predictor.len(),
8086 with_predictor.len(),
8087 100.0 * (pre_predictor.len() as f64 - with_predictor.len() as f64)
8088 / pre_predictor.len() as f64,
8089 );
8090 assert!(
8091 with_predictor.len() <= pre_predictor.len(),
8092 "predictor chooser regressed on natural fixture: {} B vs {} B",
8093 with_predictor.len(),
8094 pre_predictor.len(),
8095 );
8096
8097 // End-to-end round trip is bit-exact through the public API.
8098 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8099 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8100 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8101 assert_eq!(img.pixels(), pixels.as_slice());
8102 }
8103
8104 // ---- round 147: §3.5.2 / §4.2 color-transform forward pass ----
8105
8106 /// `color_xfrm_delta` matches the §3.5.2 formula
8107 /// `(int8(t) * int8(c)) >> 5` for both signed inputs.
8108 #[test]
8109 fn color_xfrm_delta_matches_spec_examples() {
8110 // t = -1, c = 64 → (-1 * 64) >> 5 = -2.
8111 assert_eq!(color_xfrm_delta(0xff, 0x40), -2);
8112 // t = 2, c = 64 → (2 * 64) >> 5 = 4.
8113 assert_eq!(color_xfrm_delta(2, 0x40), 4);
8114 // t = 0, c = anything → 0.
8115 assert_eq!(color_xfrm_delta(0, 0x7f), 0);
8116 // Identity case: t = 0 (no slope) ⇒ no contribution.
8117 assert_eq!(color_xfrm_delta(0, 0xff), 0);
8118 }
8119
8120 /// Forward + inverse §3.5.2 color transform round-trips per-pixel
8121 /// for arbitrary CTE values. Validates [`forward_color_pixel`]
8122 /// against the decoder's [`crate::vp8l_transform::inverse_color`]
8123 /// math.
8124 #[test]
8125 fn forward_color_pixel_round_trips_through_decoder_inverse() {
8126 use crate::vp8l_transform;
8127 let cases: &[(u8, u8, u8, u8, u8, u8)] = &[
8128 // (r, g, b, gtr, gtb, rtb)
8129 (120, 80, 200, 0x12, 0xf0, 0x05),
8130 (255, 0, 0, 0x20, 0x00, 0x00),
8131 (0, 255, 0, 0x00, 0x20, 0x00),
8132 (0, 0, 255, 0x00, 0x00, 0x20),
8133 (200, 100, 50, 0xe0, 0xd0, 0x10),
8134 ];
8135 for &(r, g, b, gtr, gtb, rtb) in cases {
8136 let (enc_r, enc_b) = forward_color_pixel(r, g, b, gtr, gtb, rtb);
8137 // Drive the decoder's helper through a 1×1 sub-image so
8138 // we exercise the actual published inverse path.
8139 let mut argb = vec![
8140 ((0xffu32) << 24) | ((enc_r as u32) << 16) | ((g as u32) << 8) | (enc_b as u32),
8141 ];
8142 // Build the §3.5.2 CTE pixel: red=rtb, green=gtb, blue=gtr.
8143 let cte = ((0xffu32) << 24) | ((rtb as u32) << 16) | ((gtb as u32) << 8) | (gtr as u32);
8144 let color_img = vec![cte];
8145 // size_bits = 9 → block 512, single block covers a 1×1 image.
8146 vp8l_transform::inverse_color(&mut argb, 1, 1, &color_img, 1, 9);
8147 assert_eq!(
8148 (argb[0] >> 16) & 0xff,
8149 r as u32,
8150 "red mismatch for r={r} g={g} b={b} gtr=0x{gtr:02x} gtb=0x{gtb:02x} rtb=0x{rtb:02x}",
8151 );
8152 assert_eq!(argb[0] & 0xff, b as u32, "blue mismatch");
8153 assert_eq!((argb[0] >> 8) & 0xff, g as u32, "green altered");
8154 }
8155 }
8156
8157 /// On a solid-color block the per-axis sweep is free to pick any
8158 /// CTE — but whichever CTE it picks must minimise the per-pixel
8159 /// folded-magnitude proxy that drove the choice. Verifying the
8160 /// picker against the all-zero baseline (which leaves residuals at
8161 /// the source's pixel values) confirms the chooser is not
8162 /// inflating cost: a constant image's red channel can still be
8163 /// "decorrelated" against the constant green if some `gtr` value
8164 /// brings `red - delta(gtr, green)` closer to zero (mod 256) than
8165 /// the raw `red`.
8166 #[test]
8167 fn pick_block_cte_is_minimum_on_solid_block() {
8168 let w = 8usize;
8169 let h = 8usize;
8170 let pixels = vec![0xff50_6070u32; w * h];
8171
8172 // Per-pixel folded-magnitude cost summed across the block, for
8173 // an arbitrary CTE.
8174 let block_cost = |gtr: u8, gtb: u8, rtb: u8| -> u64 {
8175 let mut c = 0u64;
8176 for &px in &pixels {
8177 let r = ((px >> 16) & 0xff) as u8;
8178 let g = ((px >> 8) & 0xff) as u8;
8179 let b = (px & 0xff) as u8;
8180 // Decompose like pick_block_cte does (additive across
8181 // channels): red proxy + blue proxy.
8182 let red_residual = (r as i32 - color_xfrm_delta(gtr, g)) as u32;
8183 let inter_blue = b as i32 - color_xfrm_delta(gtb, g);
8184 let blue_residual = (inter_blue - color_xfrm_delta(rtb, r)) as u32;
8185 c += channel_magnitude(red_residual) as u64;
8186 c += channel_magnitude(blue_residual) as u64;
8187 }
8188 c
8189 };
8190
8191 let (gtr, gtb, rtb) = pick_block_cte(&pixels, w, h, 0, 0, w, h);
8192 let picked_cost = block_cost(gtr, gtb, rtb);
8193 let zero_cost = block_cost(0, 0, 0);
8194 assert!(
8195 picked_cost <= zero_cost,
8196 "picked CTE (0x{gtr:02x}, 0x{gtb:02x}, 0x{rtb:02x}) cost {picked_cost} > all-zero cost {zero_cost}",
8197 );
8198 }
8199
8200 /// On a strongly green-correlated image (`red ≈ green / 2`), the
8201 /// per-axis sweep must pick a non-zero `green_to_red` to cancel
8202 /// the slope. A slope of 1/2 corresponds to a fixed-point value
8203 /// of 16 (since `>> 5` divides by 32: 16/32 = 0.5).
8204 #[test]
8205 fn pick_block_cte_recovers_known_slope() {
8206 let w = 16usize;
8207 let h = 16usize;
8208 let mut pixels = Vec::with_capacity(w * h);
8209 for y in 0..h {
8210 for x in 0..w {
8211 let g = ((x + y) * 4) as u32 & 0xff;
8212 // red = green / 2 (deterministic linear correlation):
8213 let r = (g / 2) & 0xff;
8214 // blue uncorrelated → keep at a constant so gtb/rtb
8215 // don't have a clear winner.
8216 let b = 0x80u32;
8217 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8218 }
8219 }
8220 let (gtr, _gtb, _rtb) = pick_block_cte(&pixels, w, h, 0, 0, w, h);
8221 // gtr should land on or near 16 (slope 0.5). Allow ±16 wiggle
8222 // because the grid is coarser than the optimum and the
8223 // residual-magnitude proxy is not strictly convex.
8224 let gtr_signed = gtr as i8 as i32;
8225 assert!(
8226 (0..=32).contains(>r_signed),
8227 "expected gtr ≈ +16 for red≈green/2 correlation, got {gtr_signed} (raw 0x{gtr:02x})",
8228 );
8229 }
8230
8231 /// Forward + inverse over a multi-block image round-trips bit-
8232 /// exactly: encoder builds the per-block color image, forward-
8233 /// transforms the pixels, decoder applies its inverse pass and
8234 /// recovers the originals.
8235 #[test]
8236 fn forward_color_round_trips_through_decoder_inverse() {
8237 use crate::vp8l_transform::inverse_color;
8238 let w = 32u32;
8239 let h = 32u32;
8240 let mut pixels = Vec::with_capacity((w * h) as usize);
8241 for y in 0..h {
8242 for x in 0..w {
8243 // Some correlation between channels (so the picker
8244 // chooses non-trivial CTEs in at least some blocks).
8245 let r = (x * 7) & 0xff;
8246 let g = (y * 5) & 0xff;
8247 let b = ((x + y) * 3) & 0xff;
8248 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8249 }
8250 }
8251 let size_bits = 4u8;
8252 let (color_img, tw, _th) =
8253 build_color_image(&pixels, w, h, size_bits, ColorTransformStrategy::L1);
8254 let mut residuals = vec![0u32; pixels.len()];
8255 apply_forward_color(&pixels, &mut residuals, w, h, &color_img, tw, size_bits);
8256 inverse_color(&mut residuals, w, h, &color_img, tw, size_bits);
8257 assert_eq!(residuals, pixels);
8258 }
8259
8260 /// End-to-end: encode + decode via the public `encode_webp_lossless`
8261 /// path round-trips a chroma-correlated image bit-exactly. The
8262 /// chooser is free to pick the color-transform candidate or not;
8263 /// the round-trip property must hold for *whatever* path it picks.
8264 #[test]
8265 fn round_trip_chroma_correlated_image_with_color_transform_candidate() {
8266 let w = 32u32;
8267 let h = 32u32;
8268 let mut rgba = Vec::with_capacity((w * h * 4) as usize);
8269 for y in 0..h {
8270 for x in 0..w {
8271 let g = ((x + y) * 4) as u8;
8272 let r = g.wrapping_div(2);
8273 let b = g.wrapping_div(3);
8274 rgba.push(r);
8275 rgba.push(g);
8276 rgba.push(b);
8277 rgba.push(0xff);
8278 }
8279 }
8280 let file = encode_webp_lossless(&rgba, w, h).unwrap();
8281 let decoded = crate::decode_webp(&file).unwrap();
8282 assert_eq!(decoded.frames[0].rgba, rgba);
8283 }
8284
8285 /// On a chroma-correlated synthetic image the §4.2 color-transform
8286 /// candidate should at worst tie the existing pre-color-transform
8287 /// chooser: even if the predictor path already wins, the chooser
8288 /// must never inflate the stream by adding the color transform as
8289 /// a new option.
8290 #[test]
8291 fn color_transform_chooser_never_regresses() {
8292 let w = 64u32;
8293 let h = 64u32;
8294 let mut pixels = Vec::with_capacity((w * h) as usize);
8295 for y in 0..h {
8296 for x in 0..w {
8297 let g = ((x + y) * 4) & 0xff;
8298 let r = (g / 2) & 0xff;
8299 let b = (g / 3) & 0xff;
8300 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8301 }
8302 }
8303 let pre_color = pre_round_147_chooser(&pixels, w, h);
8304 let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8305 eprintln!(
8306 "[round-147] {}x{} chroma-correlated synth: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8307 w,
8308 h,
8309 pre_color.len(),
8310 with_color.len(),
8311 100.0 * (pre_color.len() as f64 - with_color.len() as f64) / pre_color.len() as f64,
8312 );
8313 assert!(
8314 with_color.len() <= pre_color.len(),
8315 "color-transform chooser regressed: {} B vs pre-color {} B",
8316 with_color.len(),
8317 pre_color.len(),
8318 );
8319
8320 // Round trip through the full encoder/decoder is exact.
8321 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8322 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8323 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8324 assert_eq!(img.pixels(), pixels.as_slice());
8325 }
8326
8327 /// Build a 128×128 channel-correlated noise fixture with
8328 /// *spatially varying* correlation slopes — each 16×16 block has a
8329 /// different `(green_to_red, green_to_blue)` correlation drawn
8330 /// from a small palette, giving the §3.5.2 per-block color
8331 /// transform a clear advantage over §3.5.3 subtract-green (which
8332 /// applies the same all-channels-equal correction everywhere).
8333 /// Within a block: spatially random green (LCG-driven), red and
8334 /// blue are `(slope × green + jitter) mod 256` in signed-mod-256
8335 /// arithmetic, with 6-bit jitter (high unique-pixel count keeps
8336 /// the §5.2.3 cache from dominating).
8337 fn make_channel_correlated_noise(w: u32, h: u32) -> Vec<u32> {
8338 let mut pixels = vec![0u32; (w * h) as usize];
8339 // Per-block (gtr, gtb) palette: four slopes giving distinct
8340 // per-block correlations so a single subtract-green delta
8341 // can't simultaneously cancel them all.
8342 let slopes: [(u32, u32); 4] = [(1, 1), (2, 2), (1, 2), (2, 1)];
8343 let block = 16u32;
8344 let bw = w.div_ceil(block);
8345 let mut state = 0x1234_5678u32;
8346 for by in 0..h.div_ceil(block) {
8347 for bx in 0..bw {
8348 let (sr, sb) = slopes[((by * bw + bx) % 4) as usize];
8349 for dy in 0..block {
8350 let y = by * block + dy;
8351 if y >= h {
8352 break;
8353 }
8354 for dx in 0..block {
8355 let x = bx * block + dx;
8356 if x >= w {
8357 break;
8358 }
8359 state = state.wrapping_mul(1664525).wrapping_add(1013904223);
8360 let g = (state >> 8) & 0xff;
8361 let jitter_r = state & 0x3f;
8362 let jitter_b = (state >> 16) & 0x3f;
8363 let r = (g.wrapping_mul(sr)).wrapping_add(jitter_r) & 0xff;
8364 let b = (g.wrapping_mul(sb)).wrapping_add(jitter_b) & 0xff;
8365 pixels[(y * w + x) as usize] = 0xff00_0000 | (r << 16) | (g << 8) | b;
8366 }
8367 }
8368 }
8369 }
8370 pixels
8371 }
8372
8373 /// Spatially-noisy + channel-correlated synthetic fixture: full-
8374 /// entropy noise across all three channels (no spatial structure
8375 /// → predictor can't help; high unique-pixel count → §5.2.3
8376 /// color cache can't slot every pixel), but `red ≈ green / 2`
8377 /// and `blue ≈ green / 4` with a few bits of jitter (strong
8378 /// linear channel correlation → color transform should help).
8379 /// On this construction the color-transform candidate must
8380 /// *strictly* beat the round-146 chooser, exercising the new
8381 /// path end-to-end.
8382 #[test]
8383 fn color_transform_path_beats_predictor_on_channel_correlated_noise() {
8384 let w = 128u32;
8385 let h = 128u32;
8386 let pixels = make_channel_correlated_noise(w, h);
8387 let pre_color = pre_round_147_chooser(&pixels, w, h);
8388 let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8389 eprintln!(
8390 "[round-147] {}x{} channel-correlated noise: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8391 w,
8392 h,
8393 pre_color.len(),
8394 with_color.len(),
8395 100.0 * (pre_color.len() as f64 - with_color.len() as f64) / pre_color.len() as f64,
8396 );
8397 // Strict inequality: the color-transform candidate must be
8398 // chosen because the channel correlation is the only available
8399 // redundancy this fixture admits.
8400 assert!(
8401 with_color.len() < pre_color.len(),
8402 "color-transform path failed to beat the round-146 chooser on a channel-correlated-noise fixture: {} B vs {} B",
8403 with_color.len(),
8404 pre_color.len(),
8405 );
8406
8407 // Round trip through the full encoder/decoder is exact.
8408 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8409 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8410 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8411 assert_eq!(img.pixels(), pixels.as_slice());
8412 }
8413
8414 /// On uncorrelated random pixels the color transform has nothing
8415 /// to decorrelate, so the chooser must keep one of the no-transform
8416 /// / subtract-green / predictor candidates and never regress.
8417 #[test]
8418 fn color_transform_chooser_does_not_regress_on_noise() {
8419 let w = 32u32;
8420 let h = 32u32;
8421 let mut pixels = Vec::with_capacity((w * h) as usize);
8422 let mut state = 0xbadd_caf3u32;
8423 for _ in 0..(w * h) {
8424 state ^= state << 13;
8425 state ^= state >> 17;
8426 state ^= state << 5;
8427 pixels.push(state | 0xff00_0000);
8428 }
8429 let pre_color = pre_round_147_chooser(&pixels, w, h);
8430 let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8431 assert!(
8432 with_color.len() <= pre_color.len(),
8433 "color-transform chooser regressed on noise: {} B vs {} B",
8434 with_color.len(),
8435 pre_color.len(),
8436 );
8437 }
8438
8439 /// Round 308: the §4.2 entropy-cost per-block CTE chooser builds a
8440 /// color sub-image whose forward transform inverts exactly — the
8441 /// cost model only changes *which* CTE is recorded, never the
8442 /// round-trip contract. Asserted across a channel-correlated noise
8443 /// fixture (spatially varying per-block slopes) at the per-region
8444 /// `size_bits`.
8445 #[test]
8446 fn pick_block_cte_entropy_color_image_round_trips() {
8447 let w = 64u32;
8448 let h = 64u32;
8449 use crate::vp8l_transform::inverse_color;
8450 let pixels = make_channel_correlated_noise(w, h);
8451 let size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
8452 let (color_img, tw, _th) =
8453 build_color_image(&pixels, w, h, size_bits, ColorTransformStrategy::Entropy);
8454 let mut residuals = vec![0u32; pixels.len()];
8455 apply_forward_color(&pixels, &mut residuals, w, h, &color_img, tw, size_bits);
8456 inverse_color(&mut residuals, w, h, &color_img, tw, size_bits);
8457 assert_eq!(residuals, pixels);
8458 }
8459
8460 /// Round 308: on a channel-correlated noise fixture the §4.2
8461 /// entropy-cost CTE candidate must never produce a *longer* stream
8462 /// than the L1-magnitude CTE candidate at the same `size_bits` and
8463 /// cache sweep — the entropy metric scores the same candidate grid
8464 /// by the bit cost the §5.x prefix codes actually minimise, so it
8465 /// at worst ties. Both round-trip bit-exact through the decoder.
8466 #[test]
8467 fn color_transform_entropy_candidate_does_not_regress_vs_l1() {
8468 let w = 128u32;
8469 let h = 128u32;
8470 let pixels = make_channel_correlated_noise(w, h);
8471 let size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
8472
8473 let l1 = select_best_cache_bits(|cache_bits| {
8474 encode_with_color_transform_strategy(
8475 &pixels,
8476 w,
8477 h,
8478 size_bits,
8479 cache_bits,
8480 w,
8481 ColorTransformStrategy::L1,
8482 )
8483 });
8484 let entropy = select_best_cache_bits(|cache_bits| {
8485 encode_with_color_transform_strategy(
8486 &pixels,
8487 w,
8488 h,
8489 size_bits,
8490 cache_bits,
8491 w,
8492 ColorTransformStrategy::Entropy,
8493 )
8494 });
8495 eprintln!(
8496 "[round-308] {}x{} channel-correlated noise §4.2 CTE chooser: L1={} B, entropy={} B",
8497 w,
8498 h,
8499 l1.len(),
8500 entropy.len(),
8501 );
8502
8503 // Both image streams decode to the original pixels once the
8504 // §3.4 5-byte VP8L header is prepended (the candidate writers
8505 // emit the post-header image stream, exactly as the chooser's
8506 // `best` is assembled in `encode_vp8l_payload`).
8507 let header = build_image_header(w, h, false);
8508 for stream in [&l1, &entropy] {
8509 let mut bare = Vec::with_capacity(header.len() + stream.len());
8510 bare.extend_from_slice(&header);
8511 bare.extend_from_slice(stream);
8512 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8513 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8514 assert_eq!(img.pixels(), pixels.as_slice());
8515 }
8516
8517 // The whole-image super-chooser keeps the byte-shortest of all
8518 // candidates (L1 + entropy + every other transform path), so it
8519 // can never be longer than the L1 color-transform candidate
8520 // alone.
8521 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8522 assert!(
8523 chosen.len() <= l1.len(),
8524 "super-chooser regressed against the L1 color-transform candidate: {} B vs {} B",
8525 chosen.len(),
8526 l1.len(),
8527 );
8528 }
8529
8530 /// Round-trip the published `lossless-128x128-natural` fixture
8531 /// through the round-147 super-chooser. The size must be at most
8532 /// the round-146 chooser's output; on a natural image the §3.5.2
8533 /// color-transform candidate's correlation cancellation usually
8534 /// shrinks the chosen stream further. Pixels round-trip bit-exact.
8535 #[test]
8536 fn natural_fixture_round_trips_through_color_transform_aware_encoder() {
8537 let bytes: &[u8] = include_bytes!("../tests/data/lossless-128x128-natural.webp");
8538 let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
8539 let w = decoded.width();
8540 let h = decoded.height();
8541 let pixels = decoded.pixels().to_vec();
8542
8543 let pre_color = pre_round_147_chooser(&pixels, w, h);
8544 let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8545 eprintln!(
8546 "[round-147] {}x{} natural fixture re-encoded: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8547 w,
8548 h,
8549 pre_color.len(),
8550 with_color.len(),
8551 100.0 * (pre_color.len() as f64 - with_color.len() as f64)
8552 / pre_color.len() as f64,
8553 );
8554 assert!(
8555 with_color.len() <= pre_color.len(),
8556 "color-transform chooser regressed on natural fixture: {} B vs {} B",
8557 with_color.len(),
8558 pre_color.len(),
8559 );
8560
8561 // End-to-end round trip is bit-exact through the public API.
8562 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8563 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8564 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8565 assert_eq!(img.pixels(), pixels.as_slice());
8566 }
8567
8568 /// Local copy of the round-146 chooser (no §4.2 color transform):
8569 /// evaluates the four
8570 /// `(no-tx | subtract-green) × (no-cache | cache)` candidates plus
8571 /// the two §4.1 predictor candidates, picking the smallest. Used
8572 /// as the regression baseline for the round-147 non-regression
8573 /// tests so they exercise *only* the color-transform delta the
8574 /// chooser added.
8575 fn pre_round_147_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
8576 let mut best = encode_argb_literals_with_width(pixels, width);
8577 let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
8578 let block = 1u32 << size_bits;
8579 if width >= block && height >= block {
8580 let candidates = [
8581 encode_with_predictor(pixels, width, height, size_bits, None, width),
8582 encode_with_predictor(
8583 pixels,
8584 width,
8585 height,
8586 size_bits,
8587 Some(DEFAULT_COLOR_CACHE_BITS),
8588 width,
8589 ),
8590 ];
8591 for cand in candidates {
8592 if cand.len() < best.len() {
8593 best = cand;
8594 }
8595 }
8596 }
8597 best
8598 }
8599
8600 // ---- round 148: §5.2.3 color-cache code-bits sweep ----
8601
8602 /// Local copy of the pre-round-148 chooser for
8603 /// [`encode_argb_literals_with_width`]: hardcoded to the round-121
8604 /// `DEFAULT_COLOR_CACHE_BITS = 8` cache size for the two
8605 /// `(no-tx | subtract-green) × cache` candidates. Used by the
8606 /// round-148 regression tests to confirm that sweeping the full
8607 /// §5.2.3 `[1..11]` `cache_code_bits` range never produces a
8608 /// larger stream than the hardcoded-8 chooser.
8609 fn pre_round_148_literals_chooser(pixels: &[u32], image_width: u32) -> Vec<u8> {
8610 debug_assert!(image_width >= 1);
8611 let mut best = encode_literals_with_options(pixels, false, None, image_width);
8612 let candidates = [
8613 encode_literals_with_options(pixels, true, None, image_width),
8614 encode_literals_with_options(
8615 pixels,
8616 false,
8617 Some(DEFAULT_COLOR_CACHE_BITS),
8618 image_width,
8619 ),
8620 encode_literals_with_options(pixels, true, Some(DEFAULT_COLOR_CACHE_BITS), image_width),
8621 ];
8622 for cand in candidates {
8623 if cand.len() < best.len() {
8624 best = cand;
8625 }
8626 }
8627 best
8628 }
8629
8630 /// `select_best_cache_bits` evaluates the disabled-cache baseline
8631 /// plus all eleven §5.2.3 sizes (`code_bits ∈ [1..11]`), i.e. it
8632 /// calls the closure exactly twelve times and returns whichever
8633 /// stream is the shortest.
8634 #[test]
8635 fn select_best_cache_bits_explores_full_spec_range() {
8636 let mut calls: Vec<Option<u32>> = Vec::new();
8637 let _ = select_best_cache_bits(|bits| {
8638 calls.push(bits);
8639 // Return a stream whose length encodes the cache-bits
8640 // choice so we can verify the chooser inspects every
8641 // candidate (smallest is `Some(7)` here).
8642 let len = match bits {
8643 None => 100,
8644 Some(b) => 200 - (b as usize) * 10 + (7 - b as i32).unsigned_abs() as usize,
8645 };
8646 vec![0u8; len]
8647 });
8648 // 12 calls: None + 11 cache sizes.
8649 assert_eq!(calls.len(), 12, "expected 12 candidates");
8650 assert_eq!(calls[0], None);
8651 for (i, bits) in (COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).enumerate() {
8652 assert_eq!(calls[i + 1], Some(bits));
8653 }
8654 }
8655
8656 /// `select_best_cache_bits` returns the smallest stream produced.
8657 #[test]
8658 fn select_best_cache_bits_returns_minimum() {
8659 // Crafted: cache_code_bits = 5 produces a 50-byte stream; all
8660 // others are larger. The sweep must return the 50-byte stream.
8661 let chosen = select_best_cache_bits(|bits| match bits {
8662 None => vec![0u8; 200],
8663 Some(5) => vec![0u8; 50],
8664 Some(b) => vec![0u8; 200 - (b as usize)],
8665 });
8666 assert_eq!(chosen.len(), 50);
8667 }
8668
8669 /// On every payload, the round-148 chooser produces a stream at
8670 /// most as large as the round-121-style hardcoded-8 chooser: the
8671 /// `cache_code_bits = 8` candidate is always among the sweep's
8672 /// twelve candidates, so the sweep can only improve.
8673 #[test]
8674 fn round_148_sweep_never_regresses_versus_hardcoded_8() {
8675 // Three contrasting payloads:
8676 // (a) small palette favouring narrow caches;
8677 // (b) wide palette favouring wide caches;
8678 // (c) random noise favouring disabled cache.
8679 let palette4: Vec<u32> = {
8680 let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
8681 let mut state = 0x1357_9bdfu32;
8682 (0..(8 * 8))
8683 .map(|_| {
8684 state ^= state << 13;
8685 state ^= state >> 17;
8686 state ^= state << 5;
8687 palette[(state as usize) % palette.len()]
8688 })
8689 .collect()
8690 };
8691 let mut wide_palette: Vec<u32> = Vec::with_capacity(32 * 32);
8692 let mut wstate = 0xabad_1deau32;
8693 for _ in 0..(32 * 32) {
8694 wstate ^= wstate << 13;
8695 wstate ^= wstate >> 17;
8696 wstate ^= wstate << 5;
8697 // 1024-color palette (10-bit truncation), opaque alpha.
8698 wide_palette.push(0xff00_0000 | (wstate & 0x3fff_3fff));
8699 }
8700 let noise: Vec<u32> = {
8701 let mut state = 0xc0de_d00du32;
8702 (0..(16 * 16))
8703 .map(|_| {
8704 state ^= state << 13;
8705 state ^= state >> 17;
8706 state ^= state << 5;
8707 state | 0xff00_0000
8708 })
8709 .collect()
8710 };
8711
8712 for (label, pixels, width) in [
8713 ("small-palette 8x8", palette4, 8u32),
8714 ("wide-palette 32x32", wide_palette, 32u32),
8715 ("noise 16x16", noise, 16u32),
8716 ] {
8717 let pre = pre_round_148_literals_chooser(&pixels, width);
8718 let post = encode_argb_literals_with_width(&pixels, width);
8719 eprintln!(
8720 "[round-148] {label}: pre={} B, post-sweep={} B",
8721 pre.len(),
8722 post.len(),
8723 );
8724 assert!(
8725 post.len() <= pre.len(),
8726 "round-148 sweep regressed on {label}: post {} B vs pre {} B",
8727 post.len(),
8728 pre.len(),
8729 );
8730 }
8731 }
8732
8733 /// On a 32×32 image whose pixels are drawn from a 16-color
8734 /// palette in a pseudo-random pattern, the round-148 sweep picks
8735 /// a `cache_code_bits` value that produces a *strictly smaller*
8736 /// stream than the hardcoded `DEFAULT_COLOR_CACHE_BITS = 8`
8737 /// choice — the four-bit difference in alphabet width pays for
8738 /// itself when the effective palette is only 16 colors.
8739 #[test]
8740 fn round_148_sweep_beats_hardcoded_8_on_small_palette() {
8741 let w = 32u32;
8742 let h = 32u32;
8743 let palette: Vec<u32> = (0..16u32)
8744 .map(|i| 0xff00_0000 | (i * 0x0011_2233))
8745 .collect();
8746 let mut pixels = Vec::with_capacity((w * h) as usize);
8747 let mut state = 0xfeed_face_u32;
8748 for _ in 0..(w * h) {
8749 state ^= state << 13;
8750 state ^= state >> 17;
8751 state ^= state << 5;
8752 pixels.push(palette[(state as usize) % palette.len()]);
8753 }
8754 let pre = pre_round_148_literals_chooser(&pixels, w);
8755 let post = encode_argb_literals_with_width(&pixels, w);
8756 eprintln!(
8757 "[round-148] small-palette 32x32: hardcoded-8={} B, sweep={} B ({:.1}% reduction)",
8758 pre.len(),
8759 post.len(),
8760 100.0 * (pre.len() as f64 - post.len() as f64) / pre.len() as f64,
8761 );
8762 assert!(
8763 post.len() < pre.len(),
8764 "expected sweep to beat hardcoded-8 on 16-color palette: post {} B vs pre {} B",
8765 post.len(),
8766 pre.len(),
8767 );
8768
8769 // Round trip through the full encoder/decoder chain is exact.
8770 let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8771 let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8772 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8773 assert_eq!(img.pixels(), pixels.as_slice());
8774 }
8775
8776 /// Verify the round-148 sweep can pick a non-default
8777 /// `cache_code_bits` value: on at least one of several
8778 /// payloads, the sweep chooses a `code_bits` value that differs
8779 /// from the round-121 hardcoded default of `8` — proving the
8780 /// chooser is exercising the full §5.2.3 `[1..11]` range rather
8781 /// than locking to the historical fixed value.
8782 ///
8783 /// The sweep is allowed to disable the cache or pick `8` on any
8784 /// individual payload (the chooser only commits to the smallest
8785 /// stream); the assertion is that at least one of the surveyed
8786 /// payloads landed on a non-default enabled cache.
8787 #[test]
8788 fn round_148_sweep_picks_non_default_cache_bits_on_some_payload() {
8789 use crate::meta_prefix::{ImageRole, MetaPrefixHeader};
8790 use crate::vp8l_stream::BitReader;
8791
8792 // Three payloads with varying palette / size / repetition
8793 // structure. Each is run through `encode_literals_with_options`
8794 // via the round-148 sweep (no §3.8.2 transform header in front,
8795 // so the chosen stream's first bit is the optional-transform
8796 // terminator `%b0` followed directly by the §3.8.3
8797 // `color-cache-info`).
8798 let mut payloads: Vec<(u32, u32, Vec<u32>)> = Vec::new();
8799
8800 // 32x32 4-color pseudo-random palette.
8801 {
8802 let w = 32u32;
8803 let h = 32u32;
8804 let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
8805 let mut pixels = Vec::with_capacity((w * h) as usize);
8806 let mut state = 0x1357_9bdfu32;
8807 for _ in 0..(w * h) {
8808 state ^= state << 13;
8809 state ^= state >> 17;
8810 state ^= state << 5;
8811 pixels.push(palette[(state as usize) % palette.len()]);
8812 }
8813 payloads.push((w, h, pixels));
8814 }
8815
8816 // 64x64 32-color pseudo-random palette.
8817 {
8818 let w = 64u32;
8819 let h = 64u32;
8820 let palette: Vec<u32> = (0..32u32)
8821 .map(|i| 0xff00_0000 | (i * 0x0008_4210))
8822 .collect();
8823 let mut pixels = Vec::with_capacity((w * h) as usize);
8824 let mut state = 0xdead_beefu32;
8825 for _ in 0..(w * h) {
8826 state ^= state << 13;
8827 state ^= state >> 17;
8828 state ^= state << 5;
8829 pixels.push(palette[(state as usize) % palette.len()]);
8830 }
8831 payloads.push((w, h, pixels));
8832 }
8833
8834 // 64x64 256-color pseudo-random palette.
8835 {
8836 let w = 64u32;
8837 let h = 64u32;
8838 let palette: Vec<u32> = (0..256u32)
8839 .map(|i| 0xff00_0000 | (i * 0x0001_0101))
8840 .collect();
8841 let mut pixels = Vec::with_capacity((w * h) as usize);
8842 let mut state = 0xc0ff_eeefu32;
8843 for _ in 0..(w * h) {
8844 state ^= state << 13;
8845 state ^= state >> 17;
8846 state ^= state << 5;
8847 pixels.push(palette[(state as usize) % palette.len()]);
8848 }
8849 payloads.push((w, h, pixels));
8850 }
8851
8852 let mut saw_non_default_enabled = false;
8853 for (w, h, pixels) in &payloads {
8854 let chosen = select_best_cache_bits(|cache_bits| {
8855 encode_literals_with_options(pixels, false, cache_bits, *w)
8856 });
8857 let mut r = BitReader::new(&chosen);
8858 assert!(!r.read_bit().unwrap());
8859 let header = MetaPrefixHeader::read(&mut r, ImageRole::Argb, *w, *h).unwrap();
8860 if header.color_cache.is_enabled() {
8861 assert!(
8862 (COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX)
8863 .contains(&header.color_cache.code_bits),
8864 "chosen code_bits {} outside §5.2.3 [{COLOR_CACHE_BITS_MIN}..{COLOR_CACHE_BITS_MAX}]",
8865 header.color_cache.code_bits,
8866 );
8867 eprintln!(
8868 "[round-148] {}x{} palette payload: sweep enabled cache with code_bits={}",
8869 w, h, header.color_cache.code_bits
8870 );
8871 if header.color_cache.code_bits != DEFAULT_COLOR_CACHE_BITS {
8872 saw_non_default_enabled = true;
8873 }
8874 } else {
8875 eprintln!(
8876 "[round-148] {}x{} palette payload: sweep disabled cache",
8877 w, h
8878 );
8879 }
8880 }
8881 assert!(
8882 saw_non_default_enabled,
8883 "expected the round-148 sweep to pick a non-default code_bits on at least one payload"
8884 );
8885 }
8886
8887 // ---- round 150: §4.4 color-indexing transform encoder ----
8888
8889 /// The §4.4 color-indexing encoder derives its bundling from the
8890 /// shared threshold table: at each boundary palette size of the
8891 /// spec's "Color Table Size to Bundled Pixel Bit Width Mapping",
8892 /// the emitted bitstream's transform header parses back (via the
8893 /// §4 transform-list reader) to the expected on-wire
8894 /// `color_table_size` and the shared accessor's `width_bits`.
8895 #[test]
8896 fn encoder_color_indexing_header_matches_shared_width_bits_table() {
8897 for (n_colors, expected_bits) in [
8898 (1usize, 3u8),
8899 (2, 3),
8900 (3, 2),
8901 (4, 2),
8902 (5, 1),
8903 (16, 1),
8904 (17, 0),
8905 (256, 0),
8906 ] {
8907 // `n_colors` distinct grays, one pixel per color.
8908 let pixels: Vec<u32> = (0..n_colors as u32)
8909 .map(|i| 0xff00_0000 | (i << 16) | (i << 8) | i)
8910 .collect();
8911 let bytes = encode_with_color_indexing(&pixels, n_colors as u32, 1, None)
8912 .expect("palette path applies to <= 256 unique colors");
8913 let mut r = crate::vp8l_stream::BitReader::new(&bytes);
8914 let list = crate::vp8l_stream::TransformList::read(&mut r).unwrap();
8915 assert_eq!(
8916 list.transforms(),
8917 &[crate::vp8l_stream::Transform::ColorIndexing {
8918 color_table_size: n_colors as u16,
8919 width_bits: expected_bits,
8920 }],
8921 "{n_colors} colors"
8922 );
8923 assert!(list.stopped_at_entropy_body());
8924 assert_eq!(
8925 expected_bits,
8926 crate::vp8l_transform::color_indexing_width_bits(n_colors),
8927 "boundary expectation drifted from the shared accessor"
8928 );
8929 }
8930 }
8931
8932 /// `forward_color_table` is the bit-exact inverse of the decoder's
8933 /// `inverse_color_table`: applying one after the other recovers
8934 /// the original palette per-channel mod 256.
8935 #[test]
8936 fn forward_color_table_round_trips_with_decoder_inverse() {
8937 let original: Vec<u32> = vec![
8938 0xff00_0000,
8939 0xff01_0203,
8940 0xff80_4020,
8941 0x7f12_3456,
8942 0x0000_00ff,
8943 ];
8944 let mut encoded = original.clone();
8945 forward_color_table(&mut encoded);
8946 crate::vp8l_transform::inverse_color_table(&mut encoded);
8947 assert_eq!(encoded, original);
8948 }
8949
8950 /// `collect_palette` returns `None` for an image with > 256 unique
8951 /// ARGB values, and `Some((palette, map))` otherwise. The palette
8952 /// is sorted, no duplicates, and every pixel maps back via `map`.
8953 #[test]
8954 fn collect_palette_early_exits_above_256_unique_colors() {
8955 // Easy under-threshold case: 4 unique colors.
8956 let small = vec![0xff10_2030, 0xff40_5060, 0xff10_2030, 0xff70_8090];
8957 let (p, m) = collect_palette(&small).expect("4-color palette fits");
8958 assert_eq!(p.len(), 3); // 0xff10_2030 appears twice, so 3 uniques.
8959 // Sorted.
8960 assert!(p.windows(2).all(|w| w[0] < w[1]));
8961 // Round-trip every pixel through the map.
8962 for px in &small {
8963 let idx = m[px] as usize;
8964 assert_eq!(p[idx], *px);
8965 }
8966
8967 // Over-threshold: 257 distinct colors → None.
8968 let big: Vec<u32> = (0..257u32).map(|i| 0xff00_0000 | i).collect();
8969 assert!(collect_palette(&big).is_none());
8970 }
8971
8972 /// End-to-end §4.4 color-indexing round trip through the decoder
8973 /// across the four `width_bits` regimes: a 2-color image
8974 /// (width_bits=3, 8-per-byte bundling), a 4-color image
8975 /// (width_bits=2, 4-per-byte), a 16-color image (width_bits=1,
8976 /// 2-per-byte), and a 64-color image (width_bits=0, 1-per-byte).
8977 /// Each round trip must reproduce the exact input ARGB pixels.
8978 #[test]
8979 fn color_indexing_round_trip_across_all_width_bits_regimes() {
8980 // Pseudo-random index pattern that visits every palette
8981 // entry at least once over each test image.
8982 let palette_64: Vec<u32> = (0..64u32)
8983 .map(|i| 0xff00_0000 | (i << 18) | (i << 10) | (i << 2))
8984 .collect();
8985 let scenarios: [(u32, u32, &[u32]); 4] = [
8986 // 2-color: width_bits = 3.
8987 (32, 4, &[0xff00_0000, 0xffff_ffff]),
8988 // 4-color: width_bits = 2.
8989 (16, 4, &[0xff10_2030, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0]),
8990 // 16-color: width_bits = 1. Pick non-zero palettes that
8991 // exercise the subtraction coding (varied deltas).
8992 (
8993 16,
8994 4,
8995 &[
8996 0xff00_0000,
8997 0xff10_2030,
8998 0xff20_4060,
8999 0xff30_6090,
9000 0xff40_80c0,
9001 0xff50_a0e0,
9002 0xff60_c0ff,
9003 0xff70_ff00,
9004 0xff80_8080,
9005 0xff90_9090,
9006 0xffa0_a0a0,
9007 0xffb0_b0b0,
9008 0xffc0_c0c0,
9009 0xffd0_d0d0,
9010 0xffe0_e0e0,
9011 0xfff0_f0f0,
9012 ],
9013 ),
9014 // 64-color: width_bits = 0 (no bundling).
9015 (16, 4, palette_64.as_slice()),
9016 ];
9017 for (w, h, palette) in scenarios {
9018 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9019 let mut state: u32 = 0xC0FF_EE12;
9020 for _ in 0..(w * h) {
9021 state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9022 pixels.push(palette[(state as usize) % palette.len()]);
9023 }
9024 let stream = encode_with_color_indexing(&pixels, w, h, None)
9025 .expect("palette fits below 256 unique");
9026 // Build a complete VP8L chunk payload (5-byte header + stream)
9027 // and decode it back through the decoder.
9028 let header = build_image_header(w, h, false);
9029 let mut payload = header.to_vec();
9030 payload.extend_from_slice(&stream);
9031 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9032 .expect("decode color-indexing round trip");
9033 assert_eq!(
9034 decoded.pixels(),
9035 pixels.as_slice(),
9036 "round-trip mismatch on {}-color palette ({}x{} image)",
9037 palette.len(),
9038 w,
9039 h
9040 );
9041 }
9042 }
9043
9044 /// Round 302: the stacked §4.4 color-indexing + §4.1 predictor
9045 /// candidate must round-trip bit-exactly through the decoder. The
9046 /// decoder reads color-indexing first (subsampling the width it
9047 /// threads into the predictor body and main image), then applies
9048 /// the inverses last-first — inverse-predictor over the bundled
9049 /// indices, then inverse-color-indexing — recovering the original
9050 /// pixels. Exercise the bundling regimes that admit a predictor
9051 /// block at the packed width (width_bits 3 / 2 / 1 / 0) so the
9052 /// `packed_width >= block` self-skip never trips for the chosen
9053 /// dimensions.
9054 #[test]
9055 fn round_302_color_indexing_predictor_round_trips_through_decoder() {
9056 let palette_64: Vec<u32> = (0..64u32)
9057 .map(|i| 0xff00_0000 | (i << 18) | (i << 10) | (i << 2))
9058 .collect();
9059 // Dimensions are chosen so `packed_width >= 16` (the default
9060 // predictor block side) and `height >= 16`, i.e. the chained
9061 // candidate produces a non-`None` stream.
9062 let scenarios: [(u32, u32, &[u32]); 4] = [
9063 // 2-color: width_bits = 3 → packed_width = ceil(W/8).
9064 (256, 32, &[0xff00_0000, 0xffff_ffff]),
9065 // 4-color: width_bits = 2 → packed_width = ceil(W/4).
9066 (
9067 128,
9068 32,
9069 &[0xff10_2030, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0],
9070 ),
9071 // 16-color: width_bits = 1 → packed_width = ceil(W/2).
9072 (
9073 64,
9074 32,
9075 &[
9076 0xff00_0000,
9077 0xff10_2030,
9078 0xff20_4060,
9079 0xff30_6090,
9080 0xff40_80c0,
9081 0xff50_a0e0,
9082 0xff60_c0ff,
9083 0xff70_ff00,
9084 0xff80_8080,
9085 0xff90_9090,
9086 0xffa0_a0a0,
9087 0xffb0_b0b0,
9088 0xffc0_c0c0,
9089 0xffd0_d0d0,
9090 0xffe0_e0e0,
9091 0xfff0_f0f0,
9092 ],
9093 ),
9094 // 64-color: width_bits = 0 (no bundling) → packed_width = W.
9095 (32, 32, palette_64.as_slice()),
9096 ];
9097 for (w, h, palette) in scenarios {
9098 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9099 // Row-coherent fill: each row is a smooth run over palette
9100 // indices so the predictor over the bundled bytes has real
9101 // spatial structure to model.
9102 for y in 0..h {
9103 for x in 0..w {
9104 let idx = ((x / 3 + y) as usize) % palette.len();
9105 pixels.push(palette[idx]);
9106 }
9107 }
9108 let stream = encode_with_color_indexing_predictor(
9109 &pixels,
9110 w,
9111 h,
9112 DEFAULT_PREDICTOR_SIZE_BITS,
9113 None,
9114 PredictorSubImageStrategy::L1,
9115 )
9116 .expect("palette feasible and packed image admits a predictor block");
9117 let header = build_image_header(w, h, false);
9118 let mut payload = header.to_vec();
9119 payload.extend_from_slice(&stream);
9120 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9121 .expect("decode color-indexing+predictor round trip");
9122 assert_eq!(
9123 decoded.pixels(),
9124 pixels.as_slice(),
9125 "round-trip mismatch on {}-color palette ({}x{} image)",
9126 palette.len(),
9127 w,
9128 h
9129 );
9130 }
9131 }
9132
9133 /// Round 302: the stacked candidate must also round-trip with a
9134 /// §5.2.3 color cache enabled over the residual stream, and at the
9135 /// single-block predictor `size_bits` the chooser also sweeps.
9136 #[test]
9137 fn round_302_color_indexing_predictor_round_trips_with_cache_and_single_block() {
9138 let palette = [0xff00_0000u32, 0xff20_4060, 0xff60_c0ff, 0xffff_ffff];
9139 let (w, h) = (96u32, 48u32);
9140 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9141 for y in 0..h {
9142 for x in 0..w {
9143 let idx = ((x / 5 + y / 2) as usize) % palette.len();
9144 pixels.push(palette[idx]);
9145 }
9146 }
9147 // Single-block size_bits large enough to collapse the packed
9148 // image into one predictor block.
9149 // Round 305: also sweep the predictor-sub-image strategy so the
9150 // entropy / sub-image-aware builders are exercised on the
9151 // packed-index residual round-trip.
9152 for size_bits in [DEFAULT_PREDICTOR_SIZE_BITS, 7u8] {
9153 for cache in [None, Some(4u32), Some(8u32)] {
9154 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9155 if let Some(stream) = encode_with_color_indexing_predictor(
9156 &pixels,
9157 w,
9158 h,
9159 size_bits,
9160 cache,
9161 pred_strategy,
9162 ) {
9163 let header = build_image_header(w, h, false);
9164 let mut payload = header.to_vec();
9165 payload.extend_from_slice(&stream);
9166 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9167 .expect("decode chained round trip");
9168 assert_eq!(
9169 decoded.pixels(),
9170 pixels.as_slice(),
9171 "mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9172 );
9173 }
9174 }
9175 }
9176 }
9177 }
9178
9179 /// Round 302: the chained candidate self-skips (returns `None`)
9180 /// when the packed image is smaller than one predictor block at the
9181 /// requested `size_bits`, so the chooser never emits a degenerate
9182 /// stream. A 4-pixel-wide, 2-color image packs to a 1-byte-wide
9183 /// bundled image (width_bits = 3), which is below the default
9184 /// 16-pixel predictor block.
9185 #[test]
9186 fn round_302_color_indexing_predictor_skips_subblock_packed_image() {
9187 let pixels = [
9188 0xff00_0000u32,
9189 0xffff_ffff,
9190 0xff00_0000,
9191 0xffff_ffff,
9192 0xffff_ffff,
9193 0xff00_0000,
9194 0xffff_ffff,
9195 0xff00_0000,
9196 ];
9197 // 4x2 image: width_bits = 3 → packed_width = 1 < block (16).
9198 assert!(
9199 encode_with_color_indexing_predictor(
9200 &pixels,
9201 4,
9202 2,
9203 DEFAULT_PREDICTOR_SIZE_BITS,
9204 None,
9205 PredictorSubImageStrategy::L1
9206 )
9207 .is_none(),
9208 "sub-block packed image must self-skip the predictor chain"
9209 );
9210 }
9211
9212 /// Round 302: the full super-chooser stays non-regressing with the
9213 /// new stacked candidate in the mix — the chosen stream is never
9214 /// larger than the best of the pre-302 single-transform candidates,
9215 /// and it still decodes back to the source pixels. Exercised on a
9216 /// row-coherent palette image where the chained transform has a
9217 /// real opportunity to win.
9218 #[test]
9219 fn round_302_chooser_never_regresses_and_round_trips() {
9220 let palette = [
9221 0xff00_0000u32,
9222 0xff20_4060,
9223 0xff40_80c0,
9224 0xff60_c0ff,
9225 0xff80_8080,
9226 0xffff_ffff,
9227 ];
9228 let (w, h) = (128u32, 64u32);
9229 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9230 for y in 0..h {
9231 for x in 0..w {
9232 // Smooth diagonal ramp through the palette → strong
9233 // spatial coherence in the bundled-index image.
9234 let idx = (((x + y) / 4) as usize) % palette.len();
9235 pixels.push(palette[idx]);
9236 }
9237 }
9238
9239 // Pre-302 best single-transform candidate: the single color-
9240 // indexing path (with the cache sweep) is the strongest
9241 // single-transform option on this palette image.
9242 let single_ci = select_best_cache_bits(|cache_bits| {
9243 encode_with_color_indexing(&pixels, w, h, cache_bits).expect("palette feasible")
9244 });
9245
9246 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9247 assert!(
9248 chosen.len() <= single_ci.len(),
9249 "chooser regressed: chosen {} > single color-indexing {}",
9250 chosen.len(),
9251 single_ci.len()
9252 );
9253
9254 // The chosen stream must decode back to the source pixels.
9255 let header = build_image_header(w, h, false);
9256 let mut payload = header.to_vec();
9257 payload.extend_from_slice(&chosen);
9258 let decoded =
9259 crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9260 assert_eq!(decoded.pixels(), pixels.as_slice());
9261 }
9262
9263 /// Round 303: the stacked §4.2 color-transform + §4.1 predictor
9264 /// candidate must round-trip bit-exactly through the decoder. The
9265 /// decoder reads color-transform first, predictor second (neither
9266 /// subsamples the width), then applies the inverses last-first —
9267 /// inverse-predictor over the color-transformed image, then
9268 /// inverse-color — recovering the original pixels. Exercise photo-
9269 /// like content across a default and single-block `size_bits`, with
9270 /// and without a residual-stream color cache.
9271 #[test]
9272 fn round_303_color_transform_predictor_round_trips_through_decoder() {
9273 // Synthetic photo-like content: smooth channel gradients plus a
9274 // deterministic noise term, so red / blue carry real correlation
9275 // against green (the §4.2 transform has something to model) and
9276 // the gradients give the §4.1 predictor real spatial structure.
9277 let (w, h) = (128u32, 96u32);
9278 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9279 let mut state: u32 = 0x1234_5678;
9280 for y in 0..h {
9281 for x in 0..w {
9282 state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9283 let n = (state >> 27) as i32 - 16;
9284 let g = ((x + y) % 256) as i32;
9285 let r = (g + 24 + n).clamp(0, 255) as u32;
9286 let b = (g - 18 - n).clamp(0, 255) as u32;
9287 pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9288 }
9289 }
9290 // Round 305: sweep the predictor-sub-image strategy too.
9291 for size_bits in [DEFAULT_COLOR_TRANSFORM_SIZE_BITS, 7u8] {
9292 for cache in [None, Some(4u32), Some(9u32)] {
9293 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9294 let stream = encode_with_color_transform_predictor(
9295 &pixels,
9296 w,
9297 h,
9298 size_bits,
9299 cache,
9300 pred_strategy,
9301 );
9302 let header = build_image_header(w, h, false);
9303 let mut payload = header.to_vec();
9304 payload.extend_from_slice(&stream);
9305 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9306 .expect("decode color-transform+predictor round trip");
9307 assert_eq!(
9308 decoded.pixels(),
9309 pixels.as_slice(),
9310 "round-trip mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9311 );
9312 }
9313 }
9314 }
9315 }
9316
9317 /// Round 303: a single-row / single-column-degenerate image still
9318 /// round-trips. With `width < block` the chooser never calls the
9319 /// chained path, but the encoder itself must still produce a valid
9320 /// stream when handed a one-block image (the smallest admissible
9321 /// input: exactly `block × block`).
9322 #[test]
9323 fn round_303_color_transform_predictor_single_block_round_trips() {
9324 let block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9325 let (w, h) = (block, block);
9326 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9327 for y in 0..h {
9328 for x in 0..w {
9329 let g = (x * 7 + y * 5) % 256;
9330 let r = (g + 13) % 256;
9331 let b = (g + 200) % 256;
9332 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9333 }
9334 }
9335 let stream = encode_with_color_transform_predictor(
9336 &pixels,
9337 w,
9338 h,
9339 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9340 None,
9341 PredictorSubImageStrategy::L1,
9342 );
9343 let header = build_image_header(w, h, false);
9344 let mut payload = header.to_vec();
9345 payload.extend_from_slice(&stream);
9346 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9347 .expect("decode single-block chain");
9348 assert_eq!(decoded.pixels(), pixels.as_slice());
9349 }
9350
9351 /// Round 303: the full super-chooser stays non-regressing with the
9352 /// new color-transform + predictor candidate in the mix — the chosen
9353 /// stream is never larger than the best of the pre-303 candidates,
9354 /// and it still decodes back to the source pixels. Exercised on a
9355 /// photo-like image where the chained transform has a real chance to
9356 /// win against the single color-transform and single predictor paths.
9357 #[test]
9358 fn round_303_chooser_never_regresses_and_round_trips() {
9359 let (w, h) = (160u32, 120u32);
9360 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9361 let mut state: u32 = 0xfeed_face;
9362 for y in 0..h {
9363 for x in 0..w {
9364 state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9365 let n = (state >> 28) as i32 - 8;
9366 let g = ((x as i32) - (y as i32)).rem_euclid(256);
9367 let r = (g + 40 + n).clamp(0, 255) as u32;
9368 let b = (g - 30 + n).clamp(0, 255) as u32;
9369 pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9370 }
9371 }
9372
9373 // Pre-303 best of the two single block-transform paths on this
9374 // photo-like image (predictor + color transform, each with the
9375 // cache sweep).
9376 let single_pred = select_best_cache_bits(|cache_bits| {
9377 encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
9378 });
9379 let single_color = select_best_cache_bits(|cache_bits| {
9380 encode_with_color_transform(
9381 &pixels,
9382 w,
9383 h,
9384 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9385 cache_bits,
9386 w,
9387 )
9388 });
9389 let pre303 = single_pred.len().min(single_color.len());
9390
9391 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9392 assert!(
9393 chosen.len() <= pre303,
9394 "chooser regressed: chosen {} > pre-303 best {}",
9395 chosen.len(),
9396 pre303
9397 );
9398
9399 let header = build_image_header(w, h, false);
9400 let mut payload = header.to_vec();
9401 payload.extend_from_slice(&chosen);
9402 let decoded =
9403 crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9404 assert_eq!(decoded.pixels(), pixels.as_slice());
9405 }
9406
9407 /// Round 304: the three-transform §4.2 color → §4.3 subtract-green →
9408 /// §4.1 predictor stack must round-trip bit-exactly through the
9409 /// decoder. The decoder reads color first, subtract-green second,
9410 /// predictor third (none subsample the width), then applies the
9411 /// inverses last-first — inverse-predictor, inverse-subtract-green,
9412 /// inverse-color — recovering the original pixels. Exercise photo-like
9413 /// content across a default and single-block `size_bits`, with and
9414 /// without a residual-stream color cache.
9415 #[test]
9416 fn round_304_color_subtract_green_predictor_round_trips_through_decoder() {
9417 let (w, h) = (128u32, 96u32);
9418 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9419 let mut state: u32 = 0x0bad_c0de;
9420 for y in 0..h {
9421 for x in 0..w {
9422 state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9423 let n = (state >> 27) as i32 - 16;
9424 let g = ((x + y) % 256) as i32;
9425 let r = (g + 31 + n).clamp(0, 255) as u32;
9426 let b = (g - 22 - n).clamp(0, 255) as u32;
9427 pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9428 }
9429 }
9430 // Round 305: sweep the predictor-sub-image strategy too.
9431 for size_bits in [DEFAULT_COLOR_TRANSFORM_SIZE_BITS, 7u8] {
9432 for cache in [None, Some(4u32), Some(9u32)] {
9433 for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9434 let stream = encode_with_color_transform_subtract_green_predictor(
9435 &pixels,
9436 w,
9437 h,
9438 size_bits,
9439 cache,
9440 pred_strategy,
9441 );
9442 let header = build_image_header(w, h, false);
9443 let mut payload = header.to_vec();
9444 payload.extend_from_slice(&stream);
9445 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9446 .expect("decode color+subtract-green+predictor round trip");
9447 assert_eq!(
9448 decoded.pixels(),
9449 pixels.as_slice(),
9450 "round-trip mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9451 );
9452 }
9453 }
9454 }
9455 }
9456
9457 /// Round 304: the smallest admissible input (exactly `block × block`)
9458 /// still round-trips through the three-transform stack.
9459 #[test]
9460 fn round_304_color_subtract_green_predictor_single_block_round_trips() {
9461 let block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9462 let (w, h) = (block, block);
9463 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9464 for y in 0..h {
9465 for x in 0..w {
9466 let g = (x * 5 + y * 3) % 256;
9467 let r = (g + 27) % 256;
9468 let b = (g + 180) % 256;
9469 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9470 }
9471 }
9472 let stream = encode_with_color_transform_subtract_green_predictor(
9473 &pixels,
9474 w,
9475 h,
9476 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9477 None,
9478 PredictorSubImageStrategy::L1,
9479 );
9480 let header = build_image_header(w, h, false);
9481 let mut payload = header.to_vec();
9482 payload.extend_from_slice(&stream);
9483 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9484 .expect("decode single-block 3-stack");
9485 assert_eq!(decoded.pixels(), pixels.as_slice());
9486 }
9487
9488 /// Round 304: the full super-chooser stays non-regressing with the new
9489 /// three-transform color → subtract-green → predictor candidate in the
9490 /// mix — the chosen stream is never larger than the best of the
9491 /// pre-304 candidates (the round-303 color + predictor 2-stack plus the
9492 /// single color / predictor / subtract-green paths) and still decodes
9493 /// back to the source pixels.
9494 #[test]
9495 fn round_304_chooser_never_regresses_and_round_trips() {
9496 let (w, h) = (160u32, 120u32);
9497 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9498 let mut state: u32 = 0x1357_9bdf;
9499 for y in 0..h {
9500 for x in 0..w {
9501 state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9502 let n = (state >> 28) as i32 - 8;
9503 let g = ((x as i32) - (y as i32)).rem_euclid(256);
9504 let r = (g + 44 + n).clamp(0, 255) as u32;
9505 let b = (g - 33 + n).clamp(0, 255) as u32;
9506 pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9507 }
9508 }
9509
9510 // Pre-304 best across the single block transforms plus the
9511 // round-303 color + predictor 2-stack.
9512 let single_pred = select_best_cache_bits(|cache_bits| {
9513 encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
9514 });
9515 let single_color = select_best_cache_bits(|cache_bits| {
9516 encode_with_color_transform(
9517 &pixels,
9518 w,
9519 h,
9520 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9521 cache_bits,
9522 w,
9523 )
9524 });
9525 let color_pred = select_best_cache_bits(|cache_bits| {
9526 encode_with_color_transform_predictor(
9527 &pixels,
9528 w,
9529 h,
9530 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9531 cache_bits,
9532 PredictorSubImageStrategy::L1,
9533 )
9534 });
9535 let pre304 = single_pred
9536 .len()
9537 .min(single_color.len())
9538 .min(color_pred.len());
9539
9540 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9541 assert!(
9542 chosen.len() <= pre304,
9543 "chooser regressed: chosen {} > pre-304 best {}",
9544 chosen.len(),
9545 pre304
9546 );
9547
9548 let header = build_image_header(w, h, false);
9549 let mut payload = header.to_vec();
9550 payload.extend_from_slice(&chosen);
9551 let decoded =
9552 crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9553 assert_eq!(decoded.pixels(), pixels.as_slice());
9554 }
9555
9556 /// Round 305: every predictor-sub-image strategy threaded through
9557 /// the stacked §3.5 chains must round-trip bit-exactly. Each
9558 /// strategy only changes which §4.1 mode is recorded per block in
9559 /// the sub-image; the forward transform recomputes residuals against
9560 /// the chosen modes and the decoder reads them back, so the
9561 /// reconstruction is strategy-independent. Exercise all three
9562 /// stacked chains (color + predictor, color + subtract-green +
9563 /// predictor, color-indexing + predictor) across the full strategy
9564 /// set on content where each chain is admissible.
9565 #[test]
9566 fn round_305_stacked_predictor_strategies_round_trip() {
9567 // Photo-like content for the two color-transform chains.
9568 let (w, h) = (128u32, 96u32);
9569 let mut photo: Vec<u32> = Vec::with_capacity((w * h) as usize);
9570 let mut state: u32 = 0x9e37_79b9;
9571 for y in 0..h {
9572 for x in 0..w {
9573 state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9574 let n = (state >> 27) as i32 - 16;
9575 let g = ((x + y) % 256) as i32;
9576 let r = (g + 28 + n).clamp(0, 255) as u32;
9577 let b = (g - 19 - n).clamp(0, 255) as u32;
9578 photo.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9579 }
9580 }
9581 for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9582 let sb = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9583 let s1 = encode_with_color_transform_predictor(&photo, w, h, sb, Some(8), strategy);
9584 let s2 = encode_with_color_transform_subtract_green_predictor(
9585 &photo,
9586 w,
9587 h,
9588 sb,
9589 Some(8),
9590 strategy,
9591 );
9592 for stream in [&s1, &s2] {
9593 let header = build_image_header(w, h, false);
9594 let mut payload = header.to_vec();
9595 payload.extend_from_slice(stream);
9596 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9597 .unwrap_or_else(|_| panic!("decode failed for strategy {strategy:?}"));
9598 assert_eq!(
9599 decoded.pixels(),
9600 photo.as_slice(),
9601 "color-chain round-trip mismatch strategy={strategy:?}"
9602 );
9603 }
9604 }
9605
9606 // Round 306: the stacked-chain sub-image-aware lambda set must
9607 // match the single-transform predictor path's lambda sweep
9608 // (encode_argb_with_predictor_chooser, [4_000, 16_000, 64_000,
9609 // 256_000]) so the two paths land on the same residual-vs-
9610 // sub-image cost crossover. If the single-transform sweep ever
9611 // changes, this asserts the stacked sweep is updated in lockstep.
9612 let stacked_lambdas: Vec<u64> = STACKED_PREDICTOR_STRATEGIES
9613 .iter()
9614 .filter_map(|s| match s {
9615 PredictorSubImageStrategy::EntropySubaware { lambda_milli } => Some(*lambda_milli),
9616 _ => None,
9617 })
9618 .collect();
9619 assert_eq!(
9620 stacked_lambdas,
9621 vec![4_000u64, 16_000, 64_000, 256_000],
9622 "stacked sub-image-aware lambda sweep must mirror the single-transform path"
9623 );
9624
9625 // Palette content for the color-indexing chain.
9626 let palette = [0xff00_0000u32, 0xff20_4060, 0xff60_c0ff, 0xffff_ffff];
9627 let (pw, ph) = (96u32, 48u32);
9628 let mut pal: Vec<u32> = Vec::with_capacity((pw * ph) as usize);
9629 for y in 0..ph {
9630 for x in 0..pw {
9631 let idx = ((x / 3 + y) as usize) % palette.len();
9632 pal.push(palette[idx]);
9633 }
9634 }
9635 for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9636 let stream = encode_with_color_indexing_predictor(
9637 &pal,
9638 pw,
9639 ph,
9640 DEFAULT_PREDICTOR_SIZE_BITS,
9641 Some(4),
9642 strategy,
9643 )
9644 .expect("palette feasible, packed image admits a predictor block");
9645 let header = build_image_header(pw, ph, false);
9646 let mut payload = header.to_vec();
9647 payload.extend_from_slice(&stream);
9648 let decoded = crate::vp8l_transform::decode_lossless(&payload, pw, ph)
9649 .unwrap_or_else(|_| panic!("decode failed for strategy {strategy:?}"));
9650 assert_eq!(
9651 decoded.pixels(),
9652 pal.as_slice(),
9653 "color-indexing-chain round-trip mismatch strategy={strategy:?}"
9654 );
9655 }
9656 }
9657
9658 /// Round 305: the entropy-aware strategies must *actually win* on
9659 /// the stacked color-transform + predictor chain for at least one
9660 /// real input — guarding the feature from becoming dead code. On
9661 /// smooth, mildly-noisy photo-like content the color transform
9662 /// decorrelates the channels, leaving a residual the §4.1 predictor
9663 /// sub-image models far better under a true Huffman bit-cost than
9664 /// under the L1 magnitude proxy: the per-block mode histogram
9665 /// concentrates, shrinking both the §7.2 sub-image and the residual
9666 /// stream. The non-L1 best here is materially smaller than L1.
9667 #[test]
9668 fn round_305_entropy_strategy_beats_l1_on_photo_chain() {
9669 let (w, h) = (96u32, 64u32);
9670 let mut px: Vec<u32> = Vec::with_capacity((w * h) as usize);
9671 let mut state: u32 = 0x1000_0000;
9672 for y in 0..h {
9673 for x in 0..w {
9674 state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9675 let n = (state >> 29) as i32 - 2;
9676 let g = ((x + y) % 256) as i32;
9677 let r = (g + 20 + n).clamp(0, 255) as u32;
9678 let b = (g - 15 - n).clamp(0, 255) as u32;
9679 px.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9680 }
9681 }
9682 let sb = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9683 let l1 = encode_with_color_transform_predictor(
9684 &px,
9685 w,
9686 h,
9687 sb,
9688 Some(8),
9689 PredictorSubImageStrategy::L1,
9690 )
9691 .len();
9692 let entropy = encode_with_color_transform_predictor(
9693 &px,
9694 w,
9695 h,
9696 sb,
9697 Some(8),
9698 PredictorSubImageStrategy::Entropy,
9699 )
9700 .len();
9701 let subaware = encode_with_color_transform_predictor(
9702 &px,
9703 w,
9704 h,
9705 sb,
9706 Some(8),
9707 PredictorSubImageStrategy::EntropySubaware {
9708 lambda_milli: 16_000,
9709 },
9710 )
9711 .len();
9712 let best_non_l1 = entropy.min(subaware);
9713 assert!(
9714 best_non_l1 < l1,
9715 "expected an entropy-aware strategy to beat L1: L1={l1} entropy={entropy} subaware={subaware}"
9716 );
9717 }
9718
9719 /// Round 305: the strategy sweep is non-regressing — the
9720 /// super-chooser's chosen stream is never larger than the round-304
9721 /// baseline (which built the stacked chains with only the L1
9722 /// predictor strategy). Since the L1 strategy remains in
9723 /// [`STACKED_PREDICTOR_STRATEGIES`], adding the entropy variants can
9724 /// only ever keep a smaller stream, never a larger one.
9725 #[test]
9726 fn round_305_strategy_sweep_never_regresses() {
9727 let (w, h) = (160u32, 120u32);
9728 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9729 let mut state: u32 = 0x2545_f491;
9730 for y in 0..h {
9731 for x in 0..w {
9732 state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9733 let n = (state >> 29) as i32 - 2;
9734 let g = ((x as i32) - (y as i32)).rem_euclid(256);
9735 let r = (g + 36 + n).clamp(0, 255) as u32;
9736 let b = (g - 25 + n).clamp(0, 255) as u32;
9737 pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9738 }
9739 }
9740
9741 // Baseline: the two color-transform stacked chains built with
9742 // only the L1 strategy (the round-304 behaviour), best across the
9743 // cache sweep and the per-region / single-block size_bits.
9744 let mut sb_sweep = vec![DEFAULT_COLOR_TRANSFORM_SIZE_BITS];
9745 let mut single = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9746 while single < 9 && ((1u32 << single) < w || (1u32 << single) < h) {
9747 single += 1;
9748 }
9749 if single != DEFAULT_COLOR_TRANSFORM_SIZE_BITS {
9750 sb_sweep.push(single);
9751 }
9752 let mut l1_best = usize::MAX;
9753 for &sb in &sb_sweep {
9754 let a = select_best_cache_bits(|cb| {
9755 encode_with_color_transform_predictor(
9756 &pixels,
9757 w,
9758 h,
9759 sb,
9760 cb,
9761 PredictorSubImageStrategy::L1,
9762 )
9763 });
9764 let b = select_best_cache_bits(|cb| {
9765 encode_with_color_transform_subtract_green_predictor(
9766 &pixels,
9767 w,
9768 h,
9769 sb,
9770 cb,
9771 PredictorSubImageStrategy::L1,
9772 )
9773 });
9774 l1_best = l1_best.min(a.len()).min(b.len());
9775 }
9776
9777 // With the full strategy sweep, the best stacked candidate is
9778 // never larger than the L1-only baseline.
9779 let mut swept_best = usize::MAX;
9780 for &sb in &sb_sweep {
9781 for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9782 let a = select_best_cache_bits(|cb| {
9783 encode_with_color_transform_predictor(&pixels, w, h, sb, cb, strategy)
9784 });
9785 let b = select_best_cache_bits(|cb| {
9786 encode_with_color_transform_subtract_green_predictor(
9787 &pixels, w, h, sb, cb, strategy,
9788 )
9789 });
9790 swept_best = swept_best.min(a.len()).min(b.len());
9791 }
9792 }
9793 assert!(
9794 swept_best <= l1_best,
9795 "strategy sweep regressed: swept {swept_best} > L1-only {l1_best}"
9796 );
9797 }
9798
9799 /// Probe across palette-shaped synthetic payloads to find at
9800 /// least one for which the round-150 super-chooser picks the
9801 /// §4.4 color-indexing path and the chosen stream is materially
9802 /// smaller than the round-149 baseline (no-tx / subtract-green /
9803 /// predictor / color-transform).
9804 ///
9805 /// The §4.4 path doesn't dominate every palette image — the
9806 /// §5.2.3 color cache + LZ77 already crunch a binary scan-line
9807 /// random image to ~1 bit/pixel, which §4.4 bundling cannot beat
9808 /// without spatial coherence to amortise the palette-table
9809 /// header. The strong §4.4 case is a *binary* image whose packed
9810 /// rows are exact LZ77 copies of preceding rows: at width_bits=3
9811 /// (8 pixels per byte), an N-pixel-wide row collapses to N/8
9812 /// bytes; row-to-row LZ77 matches in the bundled stream cover
9813 /// the row's full N/8 bytes in one Copy token, vs N/3-ish
9814 /// literal pixel tokens without bundling.
9815 #[test]
9816 fn round_150_color_indexing_beats_other_candidates_on_palette_image() {
9817 // 64x32 binary image with row repetition: each row's binary
9818 // pattern is the previous row XOR a fixed-period mask. The
9819 // §4.4 bundled stream (width_bits=3 → 8 bytes wide) has 8
9820 // packed bytes per row of distinct patterns the matcher
9821 // chains; pixel-level LZ77 has 64 literal tokens per row to
9822 // chain. The bundled path's Huffman code over the 8 packed
9823 // bytes is tighter and the row-to-row Copy tokens have a
9824 // smaller distance (8 vs 64), so the entropy stage shrinks
9825 // them further.
9826 let palette: [u32; 2] = [0xff00_0000, 0xffff_ffff];
9827 let w = 64u32;
9828 let h = 32u32;
9829 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9830 let mut row_pattern: u64 = 0xa5a5_a5a5_a5a5_a5a5;
9831 for _y in 0..h {
9832 for x in 0..w {
9833 let bit = (row_pattern >> (x % 64)) & 1;
9834 pixels.push(palette[bit as usize]);
9835 }
9836 // Rotate the row pattern by one bit each row so rows are
9837 // similar (LZ77 finds long matches in the bundled
9838 // stream) but not identical.
9839 row_pattern = row_pattern.rotate_left(1);
9840 }
9841 // The chosen stream is what the chooser actually emits.
9842 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9843 // Force the no-color-indexing baseline by sampling the chooser's
9844 // pre-CI candidates. The §4.4 candidate must beat the baseline
9845 // measurably (palette-coded images get 2..8× index bundling on
9846 // top of the subtraction-coded palette).
9847 let no_tx_baseline =
9848 select_best_cache_bits(|bits| encode_literals_with_options(&pixels, false, bits, w));
9849 let sg_baseline =
9850 select_best_cache_bits(|bits| encode_literals_with_options(&pixels, true, bits, w));
9851 let pred_baseline = select_best_cache_bits(|bits| {
9852 encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, bits, w)
9853 });
9854 let ctx_baseline = select_best_cache_bits(|bits| {
9855 encode_with_color_transform(&pixels, w, h, DEFAULT_COLOR_TRANSFORM_SIZE_BITS, bits, w)
9856 });
9857 let baseline = no_tx_baseline
9858 .len()
9859 .min(sg_baseline.len())
9860 .min(pred_baseline.len())
9861 .min(ctx_baseline.len());
9862 let ci_only = select_best_cache_bits(|bits| {
9863 encode_with_color_indexing(&pixels, w, h, bits).expect("palette fits")
9864 });
9865 eprintln!(
9866 "[round-150] 64x32 binary row-rotation: chosen={} B, baseline (no §4.4)={} B, ci_only={} B ({:.1}% reduction vs baseline)",
9867 chosen.len(),
9868 baseline,
9869 ci_only.len(),
9870 (1.0 - chosen.len() as f64 / baseline as f64) * 100.0
9871 );
9872 assert!(
9873 chosen.len() < baseline,
9874 "round-150 color-indexing must beat the round-149 baseline on a palette image: \
9875 chosen={} B vs baseline={} B (ci_only={} B)",
9876 chosen.len(),
9877 baseline,
9878 ci_only.len(),
9879 );
9880
9881 // And the chosen stream must still round-trip through the
9882 // top-level decoder when wrapped in a complete RIFF/WEBP file.
9883 let rgba: Vec<u8> = pixels
9884 .iter()
9885 .flat_map(|&p| {
9886 let a = ((p >> 24) & 0xff) as u8;
9887 let r = ((p >> 16) & 0xff) as u8;
9888 let g = ((p >> 8) & 0xff) as u8;
9889 let b = (p & 0xff) as u8;
9890 [r, g, b, a]
9891 })
9892 .collect();
9893 let webp_bytes = encode_webp_lossless(&rgba, w, h).expect("encode round-150 webp");
9894 let decoded = crate::decode_webp(&webp_bytes).expect("decode round-150 webp");
9895 assert_eq!(decoded.frames.len(), 1);
9896 assert_eq!(decoded.frames[0].rgba.as_slice(), rgba.as_slice());
9897 }
9898
9899 /// On photo-like noise (>256 unique colors), the §4.4 candidate
9900 /// is unreachable (the O(N) palette probe returns `None`) and the
9901 /// chooser silently keeps the best of the round-149 candidates.
9902 /// This guarantees the round-150 path never regresses on
9903 /// non-palette content.
9904 #[test]
9905 fn color_indexing_chooser_skips_photo_like_content() {
9906 let w = 64u32;
9907 let h = 64u32;
9908 let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9909 // 64x64 = 4096 unique values, well above the §4.4 256-entry
9910 // threshold.
9911 let mut state: u32 = 0xFEED_FACE;
9912 for _ in 0..(w * h) {
9913 state = state.wrapping_mul(1_103_515_245).wrapping_add(12345);
9914 pixels.push(0xff00_0000 | (state & 0x00ff_ffff));
9915 }
9916 assert!(collect_palette(&pixels).is_none());
9917 // The chooser must still return a valid stream that decodes
9918 // exactly — the §4.4 path is just silently skipped.
9919 let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
9920 let header = build_image_header(w, h, false);
9921 let mut payload = header.to_vec();
9922 payload.extend_from_slice(&stream);
9923 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9924 .expect("decode photo-like content");
9925 assert_eq!(decoded.pixels(), pixels.as_slice());
9926 }
9927
9928 // ---- Round 151: §6.2.2 multi-meta-prefix (entropy image) ----
9929
9930 /// Build a synthetic two-region image: the top half draws from a
9931 /// smooth low-green gradient, the bottom half from a smooth
9932 /// high-green gradient. The per-region green statistics diverge
9933 /// sharply, so the encoder's mean-green clusterer should split the
9934 /// image cleanly along the horizontal midpoint and the per-region
9935 /// Huffman codes get tighter than a single shared code over both
9936 /// regions' bimodal histogram.
9937 fn two_region_bimodal_image(width: u32, height: u32) -> Vec<u32> {
9938 let w = width as usize;
9939 let h = height as usize;
9940 let mut pixels = Vec::with_capacity(w * h);
9941 for y in 0..h {
9942 for x in 0..w {
9943 let (r, g, b) = if y < h / 2 {
9944 // Top: low green, varying red.
9945 let g = 32u32.wrapping_add(((x as u32) & 0x1f) * 2);
9946 let r = 64u32.wrapping_add((y as u32) & 0x0f);
9947 (r, g, 16u32)
9948 } else {
9949 // Bottom: high green, varying blue.
9950 let g = 200u32.wrapping_add((x as u32) & 0x1f);
9951 let b = 96u32.wrapping_add((y as u32) & 0x0f);
9952 (16u32, g, b)
9953 };
9954 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9955 }
9956 }
9957 pixels
9958 }
9959
9960 /// Build a noisy two-region image whose unique-color count blows
9961 /// the §4.4 palette path (forcing the chooser onto the LZ77 /
9962 /// predictor / color-transform candidates). The top half draws
9963 /// red/green/blue from one PRNG state, the bottom half from a
9964 /// disjoint PRNG state biased to different per-channel means; the
9965 /// per-region histograms diverge enough that per-region Huffman
9966 /// codes beat a single shared code.
9967 fn two_region_noisy_image(width: u32, height: u32) -> Vec<u32> {
9968 let w = width as usize;
9969 let h = height as usize;
9970 let mut pixels = Vec::with_capacity(w * h);
9971 let mut s_top: u32 = 0xC0FF_EE00;
9972 let mut s_bot: u32 = 0xBADC_AFE5;
9973 for y in 0..h {
9974 for x in 0..w {
9975 let argb = if y < h / 2 {
9976 s_top = s_top.wrapping_mul(1_103_515_245).wrapping_add(12345);
9977 let r = s_top & 0x3f; // 0..63
9978 let g = ((s_top >> 8) & 0x3f).wrapping_add(192); // 192..255
9979 let b = (s_top >> 16) & 0x1f; // 0..31
9980 (0xffu32 << 24) | (r << 16) | (g << 8) | b
9981 } else {
9982 s_bot = s_bot.wrapping_mul(1_103_515_245).wrapping_add(12345);
9983 let r = ((s_bot >> 8) & 0x3f).wrapping_add(192); // 192..255
9984 let g = s_bot & 0x3f; // 0..63
9985 let b = ((s_bot >> 16) & 0x1f).wrapping_add(192); // 192..223
9986 (0xffu32 << 24) | (r << 16) | (g << 8) | b
9987 };
9988 // `x` is intentionally unused: we want per-pixel hashes
9989 // to diverge from the PRNG state alone so per-region
9990 // histograms remain stable across columns.
9991 let _ = x;
9992 pixels.push(argb);
9993 }
9994 }
9995 pixels
9996 }
9997
9998 /// The histogram-distance clusterer must produce a non-degenerate
9999 /// (≥ 2-group) split on the headline two-region bimodal fixture
10000 /// (top and bottom halves use disjoint per-channel ranges), and
10001 /// the resulting meta-codes must reflect the top-vs-bottom split.
10002 #[test]
10003 fn meta_prefix_clusterer_splits_two_region_bimodal_fixture() {
10004 let w = 64u32;
10005 let h = 64u32;
10006 let pixels = two_region_bimodal_image(w, h);
10007 // prefix_bits = 4 → 16-pixel blocks → 4x4 entropy image; the
10008 // horizontal midpoint sits on the block-row-2/3 boundary, so
10009 // clustering should put rows 0..2 in one group and rows 2..4 in
10010 // the other.
10011 let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 2);
10012 assert_eq!(codes.len(), 16);
10013 // Top two block-rows should agree; bottom two should agree;
10014 // the two halves must differ from each other.
10015 let top = codes[0];
10016 let bot = codes[12];
10017 assert_ne!(
10018 top, bot,
10019 "top half group must differ from bottom half group"
10020 );
10021 for c in &codes[0..8] {
10022 assert_eq!(*c, top, "top-half blocks must share a group");
10023 }
10024 for c in &codes[8..16] {
10025 assert_eq!(*c, bot, "bottom-half blocks must share a group");
10026 }
10027 }
10028
10029 /// The histogram-distance clusterer must separate two regions
10030 /// whose per-block *mean green* coincides but whose per-block
10031 /// green *distribution* diverges — the failure mode of the
10032 /// round-151 mean-statistic bucketiser. Top half: bimodal green
10033 /// alternating 16/240 (mean ≈ 128). Bottom half: flat green at
10034 /// 128 (also mean ≈ 128).
10035 #[test]
10036 fn histogram_clusterer_separates_blocks_sharing_a_mean() {
10037 let w = 32u32;
10038 let h = 32u32;
10039 let w_us = w as usize;
10040 let h_us = h as usize;
10041 let mut pixels: Vec<u32> = Vec::with_capacity(w_us * h_us);
10042 for y in 0..h_us {
10043 for x in 0..w_us {
10044 let g = if y < h_us / 2 {
10045 if (x ^ y) & 1 == 0 {
10046 16u32
10047 } else {
10048 240u32
10049 }
10050 } else {
10051 128u32
10052 };
10053 pixels.push(0xff00_0000 | (g << 8));
10054 }
10055 }
10056 // prefix_bits = 4 → 16-pixel blocks → 2x2 entropy image. The
10057 // top row of two blocks should differ from the bottom row.
10058 let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 2);
10059 assert_eq!(codes.len(), 4);
10060 let top_left = codes[0];
10061 let bot_left = codes[2];
10062 assert_ne!(
10063 top_left, bot_left,
10064 "bimodal-vs-flat green regions must split into distinct groups",
10065 );
10066 }
10067
10068 /// Clustering must be a pure function of its inputs: two calls
10069 /// with the same arguments produce the same `Vec<u16>`. Encoder
10070 /// reproducibility depends on this.
10071 #[test]
10072 fn histogram_clusterer_is_deterministic() {
10073 let w = 64u32;
10074 let h = 64u32;
10075 let pixels = two_region_noisy_image(w, h);
10076 let first = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 3);
10077 let second = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 3);
10078 assert_eq!(first, second);
10079 }
10080
10081 /// A uniform image (every pixel the same value) has no per-block
10082 /// histogram divergence, so the clusterer must collapse to a
10083 /// single group. The encoder relies on this `actual_groups < 2`
10084 /// signal to skip the multi-group path cleanly.
10085 #[test]
10086 fn histogram_clusterer_collapses_on_uniform_image() {
10087 let w = 64u32;
10088 let h = 64u32;
10089 let pixels = vec![0xff80_8080u32; (w * h) as usize];
10090 let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 4);
10091 assert_eq!(codes.len(), 16);
10092 for c in &codes {
10093 assert_eq!(*c, 0, "uniform image must collapse to one group");
10094 }
10095 }
10096
10097 /// `num_groups = 1` must short-circuit straight to an all-zeros
10098 /// map (the caller asked for one group; running Lloyd's iteration
10099 /// would only waste cycles confirming the trivial answer).
10100 #[test]
10101 fn histogram_clusterer_num_groups_one_returns_all_zeros() {
10102 let w = 32u32;
10103 let h = 32u32;
10104 let pixels = two_region_noisy_image(w, h);
10105 let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 1);
10106 assert!(codes.iter().all(|&c| c == 0));
10107 }
10108
10109 /// The returned meta-codes must form the *compact* contiguous
10110 /// range `0..max + 1` with no gaps. Per RFC 9649 §3.7.2.2.2,
10111 /// `num_prefix_groups = max(entropy image) + 1`, so an unused
10112 /// group sitting between used ones would inflate the encoder's
10113 /// per-group prefix-code-table cost without ever being read.
10114 #[test]
10115 fn histogram_clusterer_returns_compact_group_ids() {
10116 let w = 64u32;
10117 let h = 64u32;
10118 let pixels = two_region_noisy_image(w, h);
10119 let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 4);
10120 let max_code = codes.iter().copied().max().unwrap_or(0) as usize;
10121 let mut seen = vec![false; max_code + 1];
10122 for &c in &codes {
10123 seen[c as usize] = true;
10124 }
10125 for (i, &s) in seen.iter().enumerate() {
10126 assert!(s, "gap at group id {i} — compaction failed");
10127 }
10128 }
10129
10130 /// `encode_with_meta_prefix` produces a stream the decoder reads
10131 /// back to the exact input pixels — the end-to-end round trip on
10132 /// a non-trivial multi-group image.
10133 #[test]
10134 fn meta_prefix_two_group_round_trips_through_decoder() {
10135 let w = 64u32;
10136 let h = 64u32;
10137 let pixels = two_region_bimodal_image(w, h);
10138 let stream = encode_with_meta_prefix(&pixels, w, h, 4, 2, None, w)
10139 .expect("two-region image admits a 2-group split");
10140 let header = build_image_header(w, h, false);
10141 let mut payload = header.to_vec();
10142 payload.extend_from_slice(&stream);
10143 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10144 .expect("decode meta-prefix stream");
10145 assert_eq!(decoded.pixels(), pixels.as_slice());
10146 }
10147
10148 /// Same round-trip as above but with the §5.2.3 color cache
10149 /// enabled at the median cache size (`code_bits = 8` → 256-entry
10150 /// cache). Verifies the cache + multi-group composition.
10151 #[test]
10152 fn meta_prefix_two_group_with_cache_round_trips_through_decoder() {
10153 let w = 32u32;
10154 let h = 32u32;
10155 let pixels = two_region_bimodal_image(w, h);
10156 let stream = encode_with_meta_prefix(&pixels, w, h, 4, 2, Some(8), w)
10157 .expect("two-region image admits a 2-group split with cache");
10158 let header = build_image_header(w, h, false);
10159 let mut payload = header.to_vec();
10160 payload.extend_from_slice(&stream);
10161 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10162 .expect("decode meta-prefix-with-cache stream");
10163 assert_eq!(decoded.pixels(), pixels.as_slice());
10164 }
10165
10166 /// Cross-check round-trip with 3 and 4 groups on a noisy
10167 /// multi-region image. Verifies the encoder's per-group code
10168 /// emission is correct for `num_prefix_groups > 2`.
10169 #[test]
10170 fn meta_prefix_three_and_four_groups_round_trip_through_decoder() {
10171 let w = 64u32;
10172 let h = 64u32;
10173 let pixels = two_region_noisy_image(w, h);
10174 for num_groups in [3u32, 4u32] {
10175 let stream = encode_with_meta_prefix(&pixels, w, h, 4, num_groups, None, w)
10176 .unwrap_or_else(|| panic!("noisy image admits {num_groups} groups"));
10177 let header = build_image_header(w, h, false);
10178 let mut payload = header.to_vec();
10179 payload.extend_from_slice(&stream);
10180 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10181 .unwrap_or_else(|e| panic!("decode {num_groups}-group stream: {e}"));
10182 assert_eq!(
10183 decoded.pixels(),
10184 pixels.as_slice(),
10185 "round-trip failed for num_groups={num_groups}"
10186 );
10187 }
10188 }
10189
10190 /// Cross-check round-trip across every `prefix_bits` value the
10191 /// chooser sweeps. Verifies the per-block size dispatch (and
10192 /// therefore the on-wire `prefix_bits - 2` field) for the full
10193 /// `META_PREFIX_BITS_SWEEP`. Image is 256x256 so the largest
10194 /// sweep value (`prefix_bits = 7` → 128-pixel blocks) still
10195 /// admits a 2×2 entropy image; smaller values produce
10196 /// proportionally larger entropy images.
10197 #[test]
10198 fn meta_prefix_all_sweep_prefix_bits_round_trip_through_decoder() {
10199 let w = 256u32;
10200 let h = 256u32;
10201 let pixels = two_region_noisy_image(w, h);
10202 for &pb in META_PREFIX_BITS_SWEEP.iter() {
10203 let stream =
10204 encode_with_meta_prefix(&pixels, w, h, pb, 2, None, w).unwrap_or_else(|| {
10205 panic!("256x256 noisy image admits 2-group at prefix_bits={pb}")
10206 });
10207 let header = build_image_header(w, h, false);
10208 let mut payload = header.to_vec();
10209 payload.extend_from_slice(&stream);
10210 let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10211 .unwrap_or_else(|e| panic!("decode prefix_bits={pb} stream: {e}"));
10212 assert_eq!(
10213 decoded.pixels(),
10214 pixels.as_slice(),
10215 "round-trip failed for prefix_bits={pb}"
10216 );
10217 }
10218 }
10219
10220 /// Degenerate cases (image too small for any multi-block split,
10221 /// uniform image whose clustering collapses to one group) must
10222 /// surface as `None` so the chooser can skip the candidate
10223 /// cleanly.
10224 #[test]
10225 fn meta_prefix_returns_none_when_too_small_for_a_split() {
10226 // 1x1 image — no `prefix_bits ∈ [4..7]` admits two blocks.
10227 let pixels = vec![0xff10_2030u32];
10228 for &pb in META_PREFIX_BITS_SWEEP.iter() {
10229 for num_groups in 2..=MAX_META_GROUPS {
10230 assert!(
10231 encode_with_meta_prefix(&pixels, 1, 1, pb, num_groups, None, 1).is_none(),
10232 "1x1 image must not produce a multi-group stream (prefix_bits={pb}, num_groups={num_groups})"
10233 );
10234 }
10235 }
10236 }
10237
10238 #[test]
10239 fn meta_prefix_returns_none_on_uniform_image() {
10240 let w = 64u32;
10241 let h = 64u32;
10242 let pixels = vec![0xff80_8080u32; (w * h) as usize];
10243 // All blocks have identical mean green → clustering collapses.
10244 assert!(encode_with_meta_prefix(&pixels, w, h, 4, 2, None, w).is_none());
10245 }
10246
10247 /// The full chooser must still produce a decodable stream when the
10248 /// multi-meta-prefix candidate sometimes wins. End-to-end via the
10249 /// top-level `decode_webp`.
10250 #[test]
10251 fn round_151_chooser_round_trips_on_two_region_image() {
10252 let w = 64u32;
10253 let h = 64u32;
10254 let pixels = two_region_bimodal_image(w, h);
10255 let rgba: Vec<u8> = pixels
10256 .iter()
10257 .flat_map(|&p| {
10258 let a = ((p >> 24) & 0xff) as u8;
10259 let r = ((p >> 16) & 0xff) as u8;
10260 let g = ((p >> 8) & 0xff) as u8;
10261 let b = (p & 0xff) as u8;
10262 [r, g, b, a]
10263 })
10264 .collect();
10265 let webp_bytes = encode_webp_lossless(&rgba, w, h).expect("encode round-151 webp");
10266 let decoded = crate::decode_webp(&webp_bytes).expect("decode round-151 webp");
10267 assert_eq!(decoded.frames.len(), 1);
10268 assert_eq!(decoded.frames[0].rgba.as_slice(), rgba.as_slice());
10269 }
10270
10271 /// Diagnostic-only sweep: prints baseline vs multi-meta-prefix
10272 /// candidate sizes across a handful of image shapes / sizes. Used
10273 /// to inform the chooser's `META_PREFIX_BITS_SWEEP` choice and to
10274 /// quantify whether the candidate ever shrinks the chosen stream
10275 /// on the round-150 super-chooser's hardest cases. Test is
10276 /// observational — no assertion beyond the round-trip — so a
10277 /// future round can re-tune the sweep without changing the
10278 /// invariant set.
10279 #[test]
10280 fn round_151_diagnostic_sweep_records_per_shape_costs() {
10281 let shapes = [
10282 (
10283 "64x64 noisy 2-region",
10284 two_region_noisy_image(64, 64),
10285 64u32,
10286 64u32,
10287 ),
10288 (
10289 "128x128 noisy 2-region",
10290 two_region_noisy_image(128, 128),
10291 128u32,
10292 128u32,
10293 ),
10294 (
10295 "64x128 noisy 2-region",
10296 two_region_noisy_image(64, 128),
10297 64u32,
10298 128u32,
10299 ),
10300 (
10301 "256x256 noisy 2-region",
10302 two_region_noisy_image(256, 256),
10303 256u32,
10304 256u32,
10305 ),
10306 ];
10307 for (name, pixels, w, h) in &shapes {
10308 let baseline = encode_argb_with_predictor_chooser(pixels, *w, *h);
10309 let mp_opt = sweep_meta_prefix_candidate(pixels, *w, *h);
10310 let mp_len = mp_opt.as_ref().map(|v| v.len()).unwrap_or(usize::MAX);
10311 eprintln!(
10312 "[round-151 diag] {name}: baseline={} B, mp_only={} B, mp_wins={}",
10313 baseline.len(),
10314 mp_len,
10315 mp_len < baseline.len()
10316 );
10317 }
10318 }
10319
10320 /// Headline regression: on a large two-region noisy image whose
10321 /// per-region channel histograms diverge sharply (and the §4.4
10322 /// palette path is unreachable because of unique-color count),
10323 /// the round-151 multi-meta-prefix path's per-region Huffman codes
10324 /// shrink the chosen stream below the round-150 super-chooser's
10325 /// best pre-round-151 candidate. Prints the delta so the round
10326 /// report can quote a measured percentage.
10327 #[test]
10328 fn round_151_multi_meta_prefix_beats_single_group_on_noisy_image() {
10329 let w = 128u32;
10330 let h = 128u32;
10331 let pixels = two_region_noisy_image(w, h);
10332
10333 // Round-150 baseline: the chooser without the round-151
10334 // multi-meta-prefix candidate.
10335 let mut baseline = encode_argb_literals_with_width(&pixels, w);
10336 let pred_block = 1u32 << DEFAULT_PREDICTOR_SIZE_BITS;
10337 let ctx_block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
10338 if w >= pred_block && h >= pred_block {
10339 let pred = select_best_cache_bits(|cache_bits| {
10340 encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
10341 });
10342 if pred.len() < baseline.len() {
10343 baseline = pred;
10344 }
10345 }
10346 if w >= ctx_block && h >= ctx_block {
10347 let ctx = select_best_cache_bits(|cache_bits| {
10348 encode_with_color_transform(
10349 &pixels,
10350 w,
10351 h,
10352 DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
10353 cache_bits,
10354 w,
10355 )
10356 });
10357 if ctx.len() < baseline.len() {
10358 baseline = ctx;
10359 }
10360 }
10361 if collect_palette(&pixels).is_some() {
10362 let ci = select_best_cache_bits(|cache_bits| {
10363 encode_with_color_indexing(&pixels, w, h, cache_bits).expect("palette fits")
10364 });
10365 if ci.len() < baseline.len() {
10366 baseline = ci;
10367 }
10368 }
10369
10370 // Round-151 multi-meta-prefix candidate (the smallest
10371 // (prefix_bits, num_groups, cache_bits) it admits).
10372 let mp = sweep_meta_prefix_candidate(&pixels, w, h)
10373 .expect("two-region 128x128 image admits a multi-group split");
10374
10375 // And the full chooser including round 151.
10376 let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
10377 eprintln!(
10378 "[round-151] 128x128 two-region noisy: chosen={} B, baseline (no §6.2.2)={} B, mp_only={} B ({:.1}% reduction vs baseline)",
10379 chosen.len(),
10380 baseline.len(),
10381 mp.len(),
10382 (1.0 - chosen.len() as f64 / baseline.len() as f64) * 100.0
10383 );
10384 assert!(
10385 chosen.len() <= baseline.len(),
10386 "round-151 chooser must never regress on the round-150 baseline: \
10387 chosen={} B vs baseline={} B (mp_only={} B)",
10388 chosen.len(),
10389 baseline.len(),
10390 mp.len(),
10391 );
10392 }
10393
10394 // ---- Round-152 measurement harness -----------------------------
10395 //
10396 // Reproduces the round-151 mean-green clusterer locally so the test
10397 // can measure the multi-meta-prefix candidate's byte cost with both
10398 // partitioners and confirm the histogram path is strictly smaller
10399 // on the diagnostic two-region noisy fixture. The mean-green
10400 // implementation here is a verbatim copy of the round-151 helper
10401 // that lived in this file before this round; it's `#[cfg(test)]`-
10402 // only and never reachable from the encoder.
10403 fn cluster_blocks_by_mean_green_for_bench(
10404 pixels: &[u32],
10405 width: u32,
10406 height: u32,
10407 prefix_bits: u8,
10408 num_groups: u32,
10409 ) -> Vec<u16> {
10410 let block_side = 1u32 << prefix_bits;
10411 let pw = width.div_ceil(block_side);
10412 let ph = height.div_ceil(block_side);
10413 let num_blocks = (pw * ph) as usize;
10414 let mut block_mean: Vec<f64> = vec![0.0; num_blocks];
10415 let mut block_count: Vec<u32> = vec![0; num_blocks];
10416 let row = width as usize;
10417 let pw_u = pw as usize;
10418 for y in 0..height as usize {
10419 let by = y / block_side as usize;
10420 for x in 0..width as usize {
10421 let bx = x / block_side as usize;
10422 let b = by * pw_u + bx;
10423 let g = ((pixels[y * row + x] >> 8) & 0xff) as f64;
10424 block_mean[b] += g;
10425 block_count[b] += 1;
10426 }
10427 }
10428 for b in 0..num_blocks {
10429 if block_count[b] > 0 {
10430 block_mean[b] /= block_count[b] as f64;
10431 }
10432 }
10433 if num_groups == 1 {
10434 return vec![0u16; num_blocks];
10435 }
10436 let mut lo = f64::INFINITY;
10437 let mut hi = f64::NEG_INFINITY;
10438 for &m in &block_mean {
10439 if m < lo {
10440 lo = m;
10441 }
10442 if m > hi {
10443 hi = m;
10444 }
10445 }
10446 if hi <= lo {
10447 return vec![0u16; num_blocks];
10448 }
10449 let span = hi - lo;
10450 let step = span / num_groups as f64;
10451 let mut codes = Vec::with_capacity(num_blocks);
10452 for &m in &block_mean {
10453 let bucket = (((m - lo) / step).floor() as i64).clamp(0, num_groups as i64 - 1);
10454 codes.push(bucket as u16);
10455 }
10456 codes
10457 }
10458
10459 /// Body-shared bencher: encode `pixels` via the multi-meta-prefix
10460 /// candidate using either the mean-green or histogram-distance
10461 /// clusterer, returning the encoded byte count. Drives
10462 /// `encode_with_meta_prefix` directly by overriding the cluster
10463 /// step's output through a tiny shim.
10464 fn measure_mp_bytes_at(
10465 pixels: &[u32],
10466 w: u32,
10467 h: u32,
10468 prefix_bits: u8,
10469 num_groups: u32,
10470 use_histogram: bool,
10471 ) -> Option<usize> {
10472 let block_side = 1u32 << prefix_bits;
10473 let pw = w.div_ceil(block_side);
10474 let ph = h.div_ceil(block_side);
10475 if (pw * ph) < num_groups {
10476 return None;
10477 }
10478 let codes = if use_histogram {
10479 cluster_blocks_by_histogram_distance(pixels, w, h, prefix_bits, num_groups)
10480 } else {
10481 cluster_blocks_by_mean_green_for_bench(pixels, w, h, prefix_bits, num_groups)
10482 };
10483 // Reach into encode_with_meta_prefix's internals by reusing
10484 // its emitter parts: build the EncoderMetaIndex from `codes`
10485 // and run the same writer path. Easier: call the encoder
10486 // directly when `use_histogram` is true (it uses the new
10487 // clusterer); the mean-green branch needs a manual emit.
10488 // Since the two paths share every step except the codes
10489 // vector, the round-trip is much cleaner if we just call
10490 // `encode_with_meta_prefix` for the histogram branch and a
10491 // tiny re-emit for the mean-green branch that mirrors the
10492 // same writer steps.
10493 //
10494 // For a measurement test it's enough to compare the two byte
10495 // counts at the same `(prefix_bits, num_groups)`, which is
10496 // exactly what the chooser ablation needs. We achieve that by
10497 // letting `encode_with_meta_prefix` drive the histogram path
10498 // and replaying the same steps inline for the mean-green
10499 // path.
10500 if use_histogram {
10501 return encode_with_meta_prefix(pixels, w, h, prefix_bits, num_groups, None, w)
10502 .map(|v| v.len());
10503 }
10504 // Mean-green inline emission (same shape as
10505 // encode_with_meta_prefix).
10506 let index = EncoderMetaIndex {
10507 prefix_bits,
10508 block_width: pw,
10509 codes,
10510 };
10511 let actual_groups = index.num_groups();
10512 if actual_groups < 2 {
10513 return None;
10514 }
10515 let tokens = tokenize_lz77(pixels);
10516 let buckets = split_tokens_by_group(&tokens, &index, w, actual_groups);
10517 let group_codes = build_group_codes(&buckets, 0, w);
10518 let mut bw = BitWriter::new();
10519 bw.write_bit(false);
10520 bw.write_bit(false);
10521 bw.write_bit(true);
10522 bw.write_bits((prefix_bits - 2) as u32, 3);
10523 let entropy_image = index.entropy_image_argb();
10524 write_entropy_coded_image_literals(&mut bw, &entropy_image);
10525 for group in &group_codes {
10526 for code in group.iter() {
10527 code.write_code_lengths(&mut bw);
10528 }
10529 }
10530 let mut pos = 0usize;
10531 let w_pixels = w as usize;
10532 for &tok in &tokens {
10533 let x = (pos % w_pixels) as u32;
10534 let y = (pos / w_pixels) as u32;
10535 let g = index.group_for(x, y) as usize;
10536 let codes = &group_codes[g];
10537 let green_code = &codes[0];
10538 let red_code = &codes[1];
10539 let blue_code = &codes[2];
10540 let alpha_code = &codes[3];
10541 let dist_code = &codes[4];
10542 match tok {
10543 Token::Literal(p) => {
10544 let a = ((p >> 24) & 0xff) as usize;
10545 let r = ((p >> 16) & 0xff) as usize;
10546 let g_ch = ((p >> 8) & 0xff) as usize;
10547 let b = (p & 0xff) as usize;
10548 green_code.write_symbol(&mut bw, g_ch);
10549 red_code.write_symbol(&mut bw, r);
10550 blue_code.write_symbol(&mut bw, b);
10551 alpha_code.write_symbol(&mut bw, a);
10552 pos += 1;
10553 }
10554 Token::CacheRef { .. } => unreachable!("no cache in measurement"),
10555 Token::Copy { length, distance } => {
10556 write_lz77_value(&mut bw, green_code, 256, length as u32);
10557 let raw_code = pixel_distance_to_distance_code(distance, w);
10558 write_lz77_value(&mut bw, dist_code, 0, raw_code);
10559 pos += length;
10560 }
10561 }
10562 }
10563 Some(bw.into_bytes().len())
10564 }
10565
10566 /// A four-region fixture where the top-left quadrant has the same
10567 /// per-channel mean as the bottom-right but a very different
10568 /// per-channel distribution, and the top-right has the same mean
10569 /// as the bottom-left also with a divergent distribution. The
10570 /// mean-green clusterer at `num_groups = 2` can only find one
10571 /// axis of separation and folds two distinct distributions onto
10572 /// the same group; the histogram clusterer separates by full
10573 /// distribution and finds the right partition.
10574 fn four_region_mean_collision_image(width: u32, height: u32) -> Vec<u32> {
10575 let w = width as usize;
10576 let h = height as usize;
10577 let mut pixels = Vec::with_capacity(w * h);
10578 let mut s: u32 = 0x12345678;
10579 for y in 0..h {
10580 for x in 0..w {
10581 s = s.wrapping_mul(1_103_515_245).wrapping_add(12345);
10582 let top = y < h / 2;
10583 let left = x < w / 2;
10584 // Pick (g, r) pairs whose means match across the
10585 // top-left vs bottom-right and top-right vs bottom-left
10586 // diagonals but whose distributions are very different.
10587 let (g, r, b) = match (top, left) {
10588 (true, true) => {
10589 // top-left: g bimodal {16, 240} mean ≈ 128
10590 let gv = if (s & 1) == 0 { 16 } else { 240 };
10591 let rv = (s >> 8) & 0x3f;
10592 let bv = (s >> 16) & 0x3f;
10593 (gv, rv, bv)
10594 }
10595 (true, false) => {
10596 // top-right: g flat 128
10597 let gv = 128u32;
10598 let rv = ((s >> 8) & 0x3f).wrapping_add(192);
10599 let bv = (s >> 16) & 0x3f;
10600 (gv, rv, bv)
10601 }
10602 (false, true) => {
10603 // bottom-left: g bimodal but {64, 192} mean ≈ 128
10604 let gv = if (s & 1) == 0 { 64 } else { 192 };
10605 let rv = (s >> 8) & 0x3f;
10606 let bv = ((s >> 16) & 0x3f).wrapping_add(192);
10607 (gv, rv, bv)
10608 }
10609 (false, false) => {
10610 // bottom-right: g flat 128 too
10611 let gv = 128u32;
10612 let rv = ((s >> 8) & 0x3f).wrapping_add(192);
10613 let bv = ((s >> 16) & 0x3f).wrapping_add(192);
10614 (gv, rv, bv)
10615 }
10616 };
10617 pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
10618 }
10619 }
10620 pixels
10621 }
10622
10623 /// For a given fixture, sweep every `(prefix_bits, num_groups)`
10624 /// the round-151 chooser searches and return the smallest
10625 /// non-degenerate multi-meta-prefix byte cost under the named
10626 /// clusterer. Returns `None` if every combination collapsed.
10627 fn best_mp_bytes_over_sweep(
10628 pixels: &[u32],
10629 w: u32,
10630 h: u32,
10631 use_histogram: bool,
10632 ) -> Option<usize> {
10633 let mut best: Option<usize> = None;
10634 for &prefix_bits in META_PREFIX_BITS_SWEEP.iter() {
10635 for num_groups in 2u32..=MAX_META_GROUPS {
10636 if let Some(bytes) =
10637 measure_mp_bytes_at(pixels, w, h, prefix_bits, num_groups, use_histogram)
10638 {
10639 best = Some(match best {
10640 Some(b) => b.min(bytes),
10641 None => bytes,
10642 });
10643 }
10644 }
10645 }
10646 best
10647 }
10648
10649 /// Confirm the round-152 histogram-distance clusterer beats (or at
10650 /// worst ties) the round-151 mean-green bucketiser on the
10651 /// diagnostic two-region noisy sweep. Prints byte counts (run with
10652 /// `--nocapture`).
10653 #[test]
10654 fn histogram_clusterer_reduces_mp_bytes_on_two_region_sweep() {
10655 let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (64, 128), (256, 256)];
10656 for &(w, h) in shapes {
10657 let pixels = two_region_noisy_image(w, h);
10658 let mg = best_mp_bytes_over_sweep(&pixels, w, h, false)
10659 .expect("mean-green path must produce a candidate");
10660 let hi = best_mp_bytes_over_sweep(&pixels, w, h, true)
10661 .expect("histogram path must produce a candidate");
10662 assert!(
10663 hi <= mg,
10664 "{w}x{h}: histogram path produced {hi} B, mean-green produced {mg} B \
10665 — histogram path must not regress on the two-region sweep",
10666 );
10667 println!(
10668 "r152 measurement {w}x{h}: mean-green={mg} B histogram={hi} B \
10669 delta={} B ({:.2}%)",
10670 mg as i64 - hi as i64,
10671 100.0 * (mg as f64 - hi as f64) / mg as f64,
10672 );
10673 }
10674 }
10675
10676 /// Confirm the histogram clusterer is *strictly* better than
10677 /// mean-green on the four-region mean-collision fixture, where
10678 /// blocks sharing a green mean diverge in distribution. Prints
10679 /// byte counts (run with `--nocapture`).
10680 #[test]
10681 fn histogram_clusterer_reduces_mp_bytes_on_mean_collision_sweep() {
10682 let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (64, 128), (256, 256)];
10683 for &(w, h) in shapes {
10684 let pixels = four_region_mean_collision_image(w, h);
10685 let mg_opt = best_mp_bytes_over_sweep(&pixels, w, h, false);
10686 let hi = best_mp_bytes_over_sweep(&pixels, w, h, true)
10687 .expect("histogram path must produce a candidate");
10688 match mg_opt {
10689 Some(mg) => {
10690 assert!(
10691 hi < mg,
10692 "{w}x{h}: histogram path produced {hi} B, mean-green produced {mg} B \
10693 — histogram path must strictly improve on mean-collision fixture",
10694 );
10695 println!(
10696 "r152 mean-collision {w}x{h}: mean-green={mg} B histogram={hi} B \
10697 delta={} B ({:.2}%)",
10698 mg as i64 - hi as i64,
10699 100.0 * (mg as f64 - hi as f64) / mg as f64,
10700 );
10701 }
10702 None => {
10703 println!(
10704 "r152 mean-collision {w}x{h}: mean-green collapsed (no candidate); \
10705 histogram={hi} B",
10706 );
10707 }
10708 }
10709 }
10710 }
10711
10712 // ---- round 155: §4.1 predictor size_bits two-value sweep ----------
10713 //
10714 // The round-155 step extends the predictor candidate from a single
10715 // `DEFAULT_PREDICTOR_SIZE_BITS = 4` block-grid to a two-value sweep
10716 // mirroring the round-147 §4.2 color-transform shape: per-region
10717 // (`size_bits = 4` → 16×16 pixel blocks) plus a maximal single-block
10718 // candidate (`size_bits` promoted up to 9 so the sub-image is 1×1).
10719 // Each value composes with the round-148 `cache_code_bits ∈ [1..11]`
10720 // + disabled-cache baseline.
10721 //
10722 // The tests below establish three contracts:
10723 //
10724 // 1) Non-regression — the round-155 chooser never produces a stream
10725 // longer than the pre-round-155 chooser (which only evaluated the
10726 // default `size_bits = 4` predictor).
10727 // 2) Strict-beat on a synthetic fixture where the maximal-single-
10728 // block predictor wins (a small image whose `size_bits = 4`
10729 // per-region path emits a costly 1×1 sub-image equal to the
10730 // single-block one but where the per-region wraps in the same
10731 // 16×16 mode, leaving the two effectively identical except for
10732 // sub-image layout — and small enough that the single-block path
10733 // wins on noise).
10734 // 3) Round-trip — every emitted stream still round-trips through
10735 // `decode_lossless_image`, so the size_bits promotion did not
10736 // break the §4.1 header.
10737
10738 /// Local copy of the pre-round-155 chooser: identical to
10739 /// [`encode_argb_with_predictor_chooser`] but evaluates only the
10740 /// default-size predictor candidate (no maximal single-block sweep).
10741 /// Used as the regression baseline for the round-155 non-regression
10742 /// tests so they exercise *only* the size_bits-sweep delta the
10743 /// chooser added.
10744 fn pre_round_155_predictor_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
10745 let mut best = encode_argb_literals_with_width(pixels, width);
10746
10747 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
10748 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
10749 let pred_block = 1u32 << pred_size_bits;
10750 let ctx_block = 1u32 << ctx_size_bits;
10751
10752 if width >= pred_block && height >= pred_block {
10753 // Pre-round-155: single `size_bits = 4` predictor only.
10754 let pred_best = select_best_cache_bits(|cache_bits| {
10755 encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
10756 });
10757 if pred_best.len() < best.len() {
10758 best = pred_best;
10759 }
10760 }
10761
10762 // §4.2 color transform unchanged (round-147 two-value sweep).
10763 if width >= ctx_block && height >= ctx_block {
10764 let mut single_block_size_bits: u8 = ctx_size_bits;
10765 while single_block_size_bits < 9
10766 && ((1u32 << single_block_size_bits) < width
10767 || (1u32 << single_block_size_bits) < height)
10768 {
10769 single_block_size_bits += 1;
10770 }
10771 let try_single_block = single_block_size_bits != ctx_size_bits;
10772 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
10773 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
10774 })];
10775 if try_single_block {
10776 candidates.push(select_best_cache_bits(|cache_bits| {
10777 encode_with_color_transform(
10778 pixels,
10779 width,
10780 height,
10781 single_block_size_bits,
10782 cache_bits,
10783 width,
10784 )
10785 }));
10786 }
10787 for cand in candidates {
10788 if cand.len() < best.len() {
10789 best = cand;
10790 }
10791 }
10792 }
10793
10794 if collect_palette(pixels).is_some() {
10795 let ci_best = select_best_cache_bits(|cache_bits| {
10796 encode_with_color_indexing(pixels, width, height, cache_bits)
10797 .expect("palette feasibility already confirmed")
10798 });
10799 if ci_best.len() < best.len() {
10800 best = ci_best;
10801 }
10802 }
10803
10804 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
10805 if mp_best.len() < best.len() {
10806 best = mp_best;
10807 }
10808 }
10809
10810 best
10811 }
10812
10813 /// Round 155 non-regression: across a fixture matrix spanning
10814 /// gradient / noise / palette-ish images and several shapes, the
10815 /// round-155 chooser must never produce a stream longer than the
10816 /// pre-round-155 chooser (which had only the default-size predictor
10817 /// candidate). The round-155 chooser is a strict superset of the
10818 /// pre-round-155 candidate set, so this is a structural guarantee.
10819 #[test]
10820 fn round_155_predictor_size_bits_sweep_never_regresses() {
10821 let shapes: &[(u32, u32)] = &[
10822 (16, 16),
10823 (20, 20),
10824 (24, 24),
10825 (32, 32),
10826 (48, 48),
10827 (16, 32),
10828 (64, 16),
10829 (40, 24),
10830 ];
10831 for &(w, h) in shapes {
10832 // Three fixture families: smooth gradient, dense noise,
10833 // small-palette stripes.
10834 let gradient: Vec<u32> = (0..(w * h) as usize)
10835 .map(|i| {
10836 let x = (i as u32) % w;
10837 let y = (i as u32) / w;
10838 let g = (x + y) & 0xFF;
10839 0xFF00_0000 | (g << 16) | (g << 8) | g
10840 })
10841 .collect();
10842 let mut seed = 0xC0FFEE_u32;
10843 let noise: Vec<u32> = (0..(w * h) as usize)
10844 .map(|_| {
10845 seed ^= seed << 13;
10846 seed ^= seed >> 17;
10847 seed ^= seed << 5;
10848 0xFF00_0000 | (seed & 0x00FF_FFFF)
10849 })
10850 .collect();
10851 let stripes: Vec<u32> = (0..(w * h) as usize)
10852 .map(|i| {
10853 let x = (i as u32) % w;
10854 match x % 4 {
10855 0 => 0xFFAA_5500,
10856 1 => 0xFF55_AA00,
10857 2 => 0xFF00_55AA,
10858 _ => 0xFF55_00AA,
10859 }
10860 })
10861 .collect();
10862
10863 for (name, pixels) in [
10864 ("gradient", &gradient),
10865 ("noise", &noise),
10866 ("stripes", &stripes),
10867 ] {
10868 let pre = pre_round_155_predictor_chooser(pixels, w, h);
10869 let post = encode_argb_with_predictor_chooser(pixels, w, h);
10870 assert!(
10871 post.len() <= pre.len(),
10872 "round-155 chooser regression on {name} {w}x{h}: pre={} B post={} B",
10873 pre.len(),
10874 post.len(),
10875 );
10876 }
10877 }
10878 }
10879
10880 /// Round 155 strict-beat: on a fixture small enough that the
10881 /// default-size predictor block-image has no useful resolution
10882 /// (a 20×20 image gives one 16×16 in-bounds block plus border
10883 /// padding that still pays a 1-pixel sub-image), the maximal
10884 /// single-block predictor strictly shrinks the chosen stream
10885 /// because both candidates share the same block-image cost while
10886 /// the single-block path picks a globally-optimal predictor mode
10887 /// over the noise pattern. The test prints the byte-saved delta so
10888 /// the round report can quote a measured number.
10889 #[test]
10890 fn round_155_predictor_size_bits_sweep_strictly_beats_default_on_some_fixture() {
10891 // 20×20 dense-residual fixture: per-pixel green channel changes
10892 // every pixel so the per-region 16×16 block path can't dominate
10893 // and the chooser's two candidates differ only in sub-image
10894 // shape + global predictor pick.
10895 let w = 20u32;
10896 let h = 20u32;
10897 let mut seed = 0xDEADBEEF_u32;
10898 let pixels: Vec<u32> = (0..(w * h) as usize)
10899 .map(|_| {
10900 seed ^= seed << 13;
10901 seed ^= seed >> 17;
10902 seed ^= seed << 5;
10903 0xFF00_0000 | (seed & 0x00FF_FFFF)
10904 })
10905 .collect();
10906
10907 let pre = pre_round_155_predictor_chooser(&pixels, w, h);
10908 let post = encode_argb_with_predictor_chooser(&pixels, w, h);
10909
10910 eprintln!(
10911 "[round-155] {w}x{h} dense-residual: pre={} B post={} B delta={} B ({:.2}%)",
10912 pre.len(),
10913 post.len(),
10914 pre.len() as i64 - post.len() as i64,
10915 (pre.len() as f64 - post.len() as f64) / pre.len() as f64 * 100.0,
10916 );
10917 assert!(
10918 post.len() < pre.len(),
10919 "round-155 maximal-single-block predictor must strictly shrink the chosen \
10920 stream on the 20x20 dense-residual fixture: pre={} B post={} B",
10921 pre.len(),
10922 post.len(),
10923 );
10924 }
10925
10926 /// Round 155 round-trip: the maximal-single-block predictor
10927 /// candidate (size_bits promoted up to 9) must still emit a valid
10928 /// §4.1 transform header that the decoder accepts; the resulting
10929 /// stream must round-trip back to the exact input pixels via
10930 /// [`crate::decode_lossless_image`]. The test directly invokes
10931 /// `encode_with_predictor` at the largest size_bits the sweep can
10932 /// pick (matching the chooser's promotion loop) and frames it with
10933 /// `build_image_header` for the round-trip path.
10934 #[test]
10935 fn round_155_predictor_single_block_round_trips_through_decoder() {
10936 let w = 64u32;
10937 let h = 16u32;
10938 let mut seed = 0xA5A5_F00D_u32;
10939 let pixels: Vec<u32> = (0..(w * h) as usize)
10940 .map(|_| {
10941 seed ^= seed << 13;
10942 seed ^= seed >> 17;
10943 seed ^= seed << 5;
10944 0xFF00_0000 | (seed & 0x00FF_FFFF)
10945 })
10946 .collect();
10947
10948 // 1) The chooser's chosen stream must round-trip end-to-end
10949 // through `build::build_webp_file` + `decode_lossless_image`.
10950 let stream_chooser = encode_argb_with_predictor_chooser(&pixels, w, h);
10951 let header_chooser = build_image_header(w, h, true);
10952 let mut payload_chooser = header_chooser.to_vec();
10953 payload_chooser.extend_from_slice(&stream_chooser);
10954 let framed_chooser =
10955 build::build_webp_file(&payload_chooser, ImageKind::Lossless, w, h).unwrap();
10956 let img = crate::decode_lossless_image(&framed_chooser)
10957 .unwrap()
10958 .unwrap();
10959 assert_eq!(img.pixels(), pixels.as_slice());
10960
10961 // 2) The single-block predictor path directly: pick the
10962 // smallest size_bits such that `1 << size_bits ≥ max(w, h)`,
10963 // matching the chooser's promotion loop.
10964 let mut single_block_size_bits: u8 = DEFAULT_PREDICTOR_SIZE_BITS;
10965 while single_block_size_bits < 9
10966 && ((1u32 << single_block_size_bits) < w || (1u32 << single_block_size_bits) < h)
10967 {
10968 single_block_size_bits += 1;
10969 }
10970 // 64×16 promotes to size_bits = 6 (block 64).
10971 assert_eq!(single_block_size_bits, 6);
10972 let stream = encode_with_predictor(&pixels, w, h, single_block_size_bits, None, w);
10973 let header = build_image_header(w, h, true);
10974 let mut payload = header.to_vec();
10975 payload.extend_from_slice(&stream);
10976 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
10977 let img2 = crate::decode_lossless_image(&framed).unwrap().unwrap();
10978 assert_eq!(img2.pixels(), pixels.as_slice());
10979 }
10980
10981 // ---- round 156: §5.2.2 single-position lazy LZ77 matching --------
10982 //
10983 // The round-156 step adds a single-position look-ahead to the §5.2.2
10984 // hash-chain matcher in `tokenize_lz77`: when a match `(L_a, _)` is
10985 // found at `pos`, the encoder also probes `pos + 1` and, if the
10986 // look-ahead yields a strictly longer match, emits `pixels[pos]` as
10987 // a literal and uses the longer match from `pos + 1` instead. The
10988 // decoder output is bit-identical for any input — only the token
10989 // partition shifts — so the property under test is *byte-count*,
10990 // not pixel correctness (which the existing round-trip tests cover).
10991 //
10992 // The internal `tokenize_lz77_inner` exposes a `lazy_depth: u32`
10993 // toggle so a test can build the strict-greedy r155 baseline token
10994 // stream (`lazy_depth = 0`) alongside the round-156 depth-1 stream
10995 // (`lazy_depth = 1`) and the round-157 depth-2 stream
10996 // (`lazy_depth = 2`) on the same fixture, then compare token counts.
10997 // Three contracts:
10998 //
10999 // 1) Round-trip — every lazy-matched stream still round-trips
11000 // end-to-end through `decode_lossless_image`.
11001 // 2) Strict-beat — on a hand-crafted fixture where the strict-
11002 // greedy matcher gets trapped in a short match, the lazy matcher
11003 // emits strictly fewer tokens (and the test asserts the headline
11004 // drop, printing the per-fixture numbers).
11005 // 3) Non-regression — on a broader fixture matrix the lazy token
11006 // count is `<=` the strict-greedy token count everywhere (the
11007 // look-ahead only ever swaps when the longer match strictly
11008 // wins, so this is a structural guarantee — the test ensures
11009 // no off-by-one in the insert-bookkeeping reintroduces a
11010 // regression on future refactors).
11011
11012 /// Round 156 round-trip: a noisy 64×16 fixture encoded with the
11013 /// round-156 lazy matcher must still decode bit-exactly back to the
11014 /// original ARGB pixels. The fixture is large enough that the
11015 /// matcher produces many `Copy` tokens, so the lazy branch is
11016 /// exercised throughout the run (and not just at the tail).
11017 #[test]
11018 fn round_156_lazy_match_round_trips_through_decoder() {
11019 let w = 64u32;
11020 let h = 16u32;
11021 let mut seed = 0xF00D_BABE_u32;
11022 let pixels: Vec<u32> = (0..(w * h) as usize)
11023 .map(|_| {
11024 seed ^= seed << 13;
11025 seed ^= seed >> 17;
11026 seed ^= seed << 5;
11027 0xFF00_0000 | (seed & 0x00FF_FFFF)
11028 })
11029 .collect();
11030
11031 // The full chooser includes the lazy matcher via
11032 // `tokenize_lz77`; the round-trip through the framed file must
11033 // recover the exact input.
11034 let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11035 let header = build_image_header(w, h, true);
11036 let mut payload = header.to_vec();
11037 payload.extend_from_slice(&stream);
11038 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11039 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11040 assert_eq!(img.pixels(), pixels.as_slice());
11041
11042 // The direct lazy-only token stream against
11043 // `encode_argb_literals_with_width` must also round-trip — this
11044 // catches the case where lazy on the no-transform path
11045 // mis-tracks the hash-chain insert bookkeeping.
11046 let stream_direct = encode_argb_literals_with_width(&pixels, w);
11047 let header_direct = build_image_header(w, h, true);
11048 let mut payload_direct = header_direct.to_vec();
11049 payload_direct.extend_from_slice(&stream_direct);
11050 let framed_direct =
11051 build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11052 let img_direct = crate::decode_lossless_image(&framed_direct)
11053 .unwrap()
11054 .unwrap();
11055 assert_eq!(img_direct.pixels(), pixels.as_slice());
11056 }
11057
11058 /// Round 156 strict-beat: a hand-crafted look-ahead-trap fixture
11059 /// where the strict-greedy matcher accepts a short match at
11060 /// position `p` that prevents a strictly longer match at `p + 1`.
11061 /// The fixture engineers two 4-pixel-hash chains so the strict
11062 /// matcher finds a length-4 match at `p` while `p + 1` finds a
11063 /// length-6 match; lazy resolves to the longer partition.
11064 ///
11065 /// Layout (each pixel is a unique ARGB constant):
11066 ///
11067 /// ```text
11068 /// pos 0..7 [A B C D E F G H] — primary prefix, gives the
11069 /// [A,B,C,D] chain entry
11070 /// at pos 0 and the
11071 /// [B,C,D,E] entry at pos 1.
11072 /// pos 8 Z — separator
11073 /// pos 9..15 [A B C D E F G] — `find(p=10)` matches the
11074 /// primary prefix [A,B,C,D,E,F,G]
11075 /// at pos 0 — length 7. Lazy
11076 /// irrelevant here (no longer
11077 /// match exists past length 7
11078 /// at pos 11).
11079 /// ```
11080 ///
11081 /// That doesn't trap. A real trap requires the `p` match to be
11082 /// strictly shorter than the `p + 1` match. The construction below
11083 /// achieves this by deliberately mismatching the 4th byte at pos
11084 /// `p`'s candidate so the strict match stops at length 4, while
11085 /// pos `p + 1` walks a second pre-seeded chain with a 6+ pixel run.
11086 /// Specifically:
11087 ///
11088 /// ```text
11089 /// pos 0..3 [A B C D] — first chain (pos 0).
11090 /// pos 4..6 [Z Z Z] — separator.
11091 /// pos 7..13 [B C D E F G H] — second chain (pos 7's
11092 /// window is [B,C,D,E]).
11093 /// pos 14..16 [Z Z Z]
11094 /// pos 17 A ← trap start. `find(17)`'s window is
11095 /// [A,B,C,D] → matches pos 0,
11096 /// extension stops at length 4
11097 /// because pos 4 = Z ≠ pos 21.
11098 /// pos 18..23 [B C D E F G] — `find(18)`'s window is
11099 /// [B,C,D,E] → matches pos 7,
11100 /// extension goes 7 long (B-H)
11101 /// against the second chain.
11102 /// ```
11103 ///
11104 /// Greedy: emits `Copy{len=4, dist=17}` at pos 17, then has to
11105 /// emit `[E,F,G]` as literals (pos 21,22,23) because the chain at
11106 /// pos 21's window is gone.
11107 ///
11108 /// Lazy: emits `Literal(A)` at pos 17, then `Copy{len=7, dist=11}`
11109 /// at pos 18, covering `[B,C,D,E,F,G,H]` from pos 7. Net: -2 tokens.
11110 #[test]
11111 fn round_156_lazy_match_strictly_beats_greedy_on_trap_fixture() {
11112 let a = 0xFF11_2233_u32;
11113 let b = 0xFF22_3344_u32;
11114 let c = 0xFF33_4455_u32;
11115 let d = 0xFF44_5566_u32;
11116 let e = 0xFF55_6677_u32;
11117 let f = 0xFF66_7788_u32;
11118 let g = 0xFF77_8899_u32;
11119 let h = 0xFF88_99AA_u32;
11120 let z = 0xFF00_0000_u32;
11121
11122 // The buffer layout (per the doc comment above). Indices are
11123 // explicit so the trap is unambiguous.
11124 let mut pixels: Vec<u32> = vec![
11125 a, b, c, d, // 0..4 primary chain anchor [A,B,C,D]
11126 z, z, z, // 4..7 separator
11127 b, c, d, e, f, g, h, // 7..14 secondary chain anchor [B,C,D,E,...]
11128 z, z, z, // 14..17 separator
11129 a, // 17 trap-start: find(17)→pos0, length 4
11130 b, c, d, e, f, g, h, // 18..25 decoy: find(18)→pos7, length 7
11131 ];
11132 // Pad to 64 pixels so the framing call has a non-degenerate
11133 // image; tail content is uniform Z so it does not interact
11134 // with the trap region.
11135 while pixels.len() < 64 {
11136 pixels.push(z);
11137 }
11138
11139 let greedy = tokenize_lz77_inner(&pixels, 0);
11140 let lazy = tokenize_lz77_inner(&pixels, 1);
11141
11142 let greedy_copies = greedy
11143 .iter()
11144 .filter(|t| matches!(t, Token::Copy { .. }))
11145 .count();
11146 let lazy_copies = lazy
11147 .iter()
11148 .filter(|t| matches!(t, Token::Copy { .. }))
11149 .count();
11150 // Sum of pixels covered by each partition: must equal the
11151 // input length for both partitions (sanity).
11152 let coverage = |toks: &[Token]| -> usize {
11153 toks.iter()
11154 .map(|t| match *t {
11155 Token::Literal(_) => 1,
11156 Token::CacheRef { .. } => 1,
11157 Token::Copy { length, .. } => length,
11158 })
11159 .sum()
11160 };
11161 assert_eq!(coverage(&greedy), pixels.len());
11162 assert_eq!(coverage(&lazy), pixels.len());
11163
11164 eprintln!(
11165 "[round-156] trap fixture: greedy tokens={} (copies={}), \
11166 lazy tokens={} (copies={}), copy delta={}",
11167 greedy.len(),
11168 greedy_copies,
11169 lazy.len(),
11170 lazy_copies,
11171 greedy_copies as i64 - lazy_copies as i64,
11172 );
11173
11174 // The trap region has greedy emit
11175 // [Copy{4, 17}, Copy{7, 11}, Copy{36, 1}] = 3 copies
11176 // while lazy emits
11177 // [Literal(A), Copy{10, 11}, Copy{36, 1}] = 2 copies
11178 // covering the same 11-pixel trap span. The lazy partition
11179 // collapses two separate copies into one longer copy, which is
11180 // the round-156 structural win. (The literal-symbol count rises
11181 // by one to compensate; total tokens may match but the *copy
11182 // count* — and the prefix-code statistics — diverge.)
11183 assert!(
11184 lazy_copies < greedy_copies,
11185 "round-156 lazy matcher must emit strictly fewer Copy tokens on the trap \
11186 fixture: greedy copies={} lazy copies={}\ngreedy partition: {:?}\n\
11187 lazy partition: {:?}",
11188 greedy_copies,
11189 lazy_copies,
11190 greedy,
11191 lazy,
11192 );
11193
11194 // Round-trip the bytes through the no-transform encoder for
11195 // good measure: the lazy path must still decode back exactly.
11196 let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11197 let w = pixels.len() as u32;
11198 let h = 1u32;
11199 let header = build_image_header(w, h, true);
11200 let mut payload = header.to_vec();
11201 payload.extend_from_slice(&stream);
11202 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11203 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11204 assert_eq!(img.pixels(), pixels.as_slice());
11205 }
11206
11207 /// Round 156 non-regression: across a broad fixture matrix
11208 /// (gradient / noise / stripes shapes), the lazy matcher's token
11209 /// count is `<=` the strict-greedy matcher's everywhere. Structural
11210 /// because the look-ahead only swaps when the alternate match is
11211 /// strictly longer, so the lazy partition uses at most as many
11212 /// tokens as the greedy partition. The test guards against
11213 /// off-by-one bugs in the hash-chain insert bookkeeping (the
11214 /// insert-of-`pos`-for-lookahead path) that future refactors might
11215 /// introduce.
11216 #[test]
11217 fn round_156_lazy_never_increases_token_count() {
11218 let shapes: &[(u32, u32)] = &[
11219 (16, 16),
11220 (20, 20),
11221 (24, 24),
11222 (32, 32),
11223 (48, 48),
11224 (16, 32),
11225 (64, 16),
11226 (40, 24),
11227 ];
11228 for &(w, h) in shapes {
11229 let gradient: Vec<u32> = (0..(w * h) as usize)
11230 .map(|i| {
11231 let x = (i as u32) % w;
11232 let y = (i as u32) / w;
11233 let g = (x + y) & 0xFF;
11234 0xFF00_0000 | (g << 16) | (g << 8) | g
11235 })
11236 .collect();
11237 let mut seed = 0xC0FFEE_u32;
11238 let noise: Vec<u32> = (0..(w * h) as usize)
11239 .map(|_| {
11240 seed ^= seed << 13;
11241 seed ^= seed >> 17;
11242 seed ^= seed << 5;
11243 0xFF00_0000 | (seed & 0x00FF_FFFF)
11244 })
11245 .collect();
11246 let stripes: Vec<u32> = (0..(w * h) as usize)
11247 .map(|i| {
11248 let x = (i as u32) % w;
11249 match x % 4 {
11250 0 => 0xFFAA_5500,
11251 1 => 0xFF55_AA00,
11252 2 => 0xFF00_55AA,
11253 _ => 0xFF55_00AA,
11254 }
11255 })
11256 .collect();
11257
11258 for (name, pixels) in [
11259 ("gradient", &gradient),
11260 ("noise", &noise),
11261 ("stripes", &stripes),
11262 ] {
11263 let greedy = tokenize_lz77_inner(pixels, 0);
11264 let lazy = tokenize_lz77_inner(pixels, 1);
11265 assert!(
11266 lazy.len() <= greedy.len(),
11267 "round-156 lazy regression on {name} {w}x{h}: greedy={} tokens, \
11268 lazy={} tokens",
11269 greedy.len(),
11270 lazy.len(),
11271 );
11272 }
11273 }
11274 }
11275
11276 // ---- round 157: §5.2.2 two-position lazy LZ77 matching -----------
11277 //
11278 // The round-157 step extends the round-156 single-position lazy
11279 // matcher with a second look-ahead position. After finding a match
11280 // `(L_a, _)` at `pos` and (depth-1) probing `pos + 1` for a strictly
11281 // longer `L_b`, the matcher also (depth-2) probes `pos + 2` for an
11282 // `L_c > max(L_a, L_b)`. When the depth-2 probe wins, the encoder
11283 // emits two literals (`pixels[pos]` and `pixels[pos + 1]`) and takes
11284 // the longer match from `pos + 2`. This recovers a *second-order*
11285 // strict-greedy trap that the round-156 depth-1 matcher could not
11286 // escape — a short match at `pos` AND a short match at `pos + 1`
11287 // together blocking a strictly longer match at `pos + 2`. The
11288 // decoder output is bit-identical for any input — only the token
11289 // *partition* shifts by up to two pixels — so round-trips remain
11290 // bit-exact under any input.
11291 //
11292 // Three contracts (mirroring the round-156 layout):
11293 //
11294 // 1) Round-trip — every depth-2 lazy-matched stream still
11295 // round-trips end-to-end through `decode_lossless_image`.
11296 // 2) Strict-beat — on a hand-crafted depth-2-trap fixture, the
11297 // depth-2 matcher emits strictly fewer Copy tokens than both
11298 // the strict-greedy matcher and the depth-1 lazy matcher.
11299 // 3) Non-regression — on a broader fixture matrix the depth-2
11300 // token count is `<=` the depth-1 token count everywhere.
11301
11302 /// Round 157 round-trip: a noisy 80×16 fixture encoded with the
11303 /// round-157 depth-2 lazy matcher (now the production
11304 /// `tokenize_lz77` default) must still decode bit-exactly back to
11305 /// the original ARGB pixels. Uses an independent xorshift seed
11306 /// from the round-156 test so both fixtures exercise the matcher
11307 /// over distinct entropy.
11308 #[test]
11309 fn round_157_depth2_lazy_match_round_trips_through_decoder() {
11310 let w = 80u32;
11311 let h = 16u32;
11312 let mut seed = 0xCAFE_F00D_u32;
11313 let pixels: Vec<u32> = (0..(w * h) as usize)
11314 .map(|_| {
11315 seed ^= seed << 13;
11316 seed ^= seed >> 17;
11317 seed ^= seed << 5;
11318 0xFF00_0000 | (seed & 0x00FF_FFFF)
11319 })
11320 .collect();
11321
11322 // The full chooser delegates to `tokenize_lz77` (depth-2 as of
11323 // round 157); end-to-end round-trip through the framed file
11324 // must recover the exact input.
11325 let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11326 let header = build_image_header(w, h, true);
11327 let mut payload = header.to_vec();
11328 payload.extend_from_slice(&stream);
11329 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11330 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11331 assert_eq!(img.pixels(), pixels.as_slice());
11332
11333 // The direct depth-2 token stream against the no-transform
11334 // encoder must also round-trip — guards against bookkeeping
11335 // bugs in the new depth-2 insert/skip dedup path.
11336 let stream_direct = encode_argb_literals_with_width(&pixels, w);
11337 let header_direct = build_image_header(w, h, true);
11338 let mut payload_direct = header_direct.to_vec();
11339 payload_direct.extend_from_slice(&stream_direct);
11340 let framed_direct =
11341 build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11342 let img_direct = crate::decode_lossless_image(&framed_direct)
11343 .unwrap()
11344 .unwrap();
11345 assert_eq!(img_direct.pixels(), pixels.as_slice());
11346 }
11347
11348 /// Round 157 strict-beat: a hand-crafted depth-2-trap fixture where
11349 /// the strict-greedy matcher AND the round-156 depth-1 lazy matcher
11350 /// both accept a short match at `pos` that prevents a strictly
11351 /// longer match at `pos + 2`. The depth-2 lazy matcher emits two
11352 /// literals and takes the longer match.
11353 ///
11354 /// Layout (each capital letter is a unique ARGB constant; the
11355 /// `Z*` family are unique separator pixels that share no 4-pixel
11356 /// window with the anchors):
11357 ///
11358 /// ```text
11359 /// pos 0..3 [P Q R S] — anchor A (4 px)
11360 /// pos 4..6 [Z1 Z2 Z3] — separator
11361 /// pos 7..10 [Q R S T] — anchor B (4 px)
11362 /// pos 11..13 [Z4 Z5 Z6] — separator
11363 /// pos 14..21 [R S T U V W X Y] — anchor C (8 px)
11364 /// pos 22..24 [Z7 Z8 Z9] — separator
11365 /// pos 25 P — trap start
11366 /// pos 26 Q
11367 /// pos 27..33 [R S T U V W X] — depth-2 chain region
11368 /// pos 34.. fill with a fresh Zfill color (no 4-window match)
11369 /// ```
11370 ///
11371 /// At pos 25:
11372 ///
11373 /// * `find(25)` window `[P,Q,R,S]` → matches anchor A (pos 0),
11374 /// extension stops at length 4 because pos 4 (Z1) ≠ pos 29 (T).
11375 /// * `find(26)` window `[Q,R,S,T]` → matches anchor B (pos 7),
11376 /// extension stops at length 4 because pos 11 (Z4) ≠ pos 30 (U).
11377 /// `L_b = 4 = L_a`, **not strictly greater**, so the depth-1
11378 /// lazy matcher does NOT swap.
11379 /// * `find(27)` window `[R,S,T,U]` → matches anchor C (pos 14),
11380 /// extension goes `[R,S,T,U,V,W,X]` (length 7) before pos 21
11381 /// (Y) ≠ pos 34 (Zfill). `L_c = 7 > 4`, so the depth-2 lazy
11382 /// matcher swaps to two literals + the length-7 match.
11383 ///
11384 /// Strict-greedy AND depth-1 partition at the trap:
11385 /// `[Copy{4, dist=25}, ...]`. Depth-2 partition: `[Lit(P),
11386 /// Lit(Q), Copy{7, dist=13}, ...]`. Net: depth-2 collapses a
11387 /// short-then-short pair into one longer copy — strictly fewer
11388 /// Copy tokens, at the cost of one extra literal (mirroring the
11389 /// round-156 pattern).
11390 #[test]
11391 fn round_157_depth2_lazy_match_strictly_beats_depth1_on_trap_fixture() {
11392 // Distinct ARGB constants. Anchor letters P..Y carry the
11393 // structural matches; Z1..Z9 + Zfill are deliberately unique
11394 // so they cannot seed a parasitic chain.
11395 let p_ = 0xFF11_2200_u32;
11396 let q_ = 0xFF22_3300_u32;
11397 let r_ = 0xFF33_4400_u32;
11398 let s_ = 0xFF44_5500_u32;
11399 let t_ = 0xFF55_6600_u32;
11400 let u_ = 0xFF66_7700_u32;
11401 let v_ = 0xFF77_8800_u32;
11402 let w_ = 0xFF88_9900_u32;
11403 let x_ = 0xFF99_AA00_u32;
11404 let y_ = 0xFFAA_BB00_u32;
11405 let z1 = 0xFFCC_DD01_u32;
11406 let z2 = 0xFFCC_DD02_u32;
11407 let z3 = 0xFFCC_DD03_u32;
11408 let z4 = 0xFFCC_DD04_u32;
11409 let z5 = 0xFFCC_DD05_u32;
11410 let z6 = 0xFFCC_DD06_u32;
11411 let z7 = 0xFFCC_DD07_u32;
11412 let z8 = 0xFFCC_DD08_u32;
11413 let z9 = 0xFFCC_DD09_u32;
11414
11415 let mut pixels: Vec<u32> = vec![
11416 p_, q_, r_, s_, // 0..4 anchor A
11417 z1, z2, z3, // 4..7 separator
11418 q_, r_, s_, t_, // 7..11 anchor B
11419 z4, z5, z6, // 11..14 separator
11420 r_, s_, t_, u_, v_, w_, x_, y_, // 14..22 anchor C
11421 z7, z8, z9, // 22..25 separator
11422 p_, q_, // 25..27 trap start (depth-1 cannot escape)
11423 r_, s_, t_, u_, v_, w_, x_, // 27..34 depth-2 chain region
11424 ];
11425 // Pad the tail with unique colors so the depth-2 swap's
11426 // post-match region cannot trigger another long match that
11427 // might mask the trap's copy-count delta.
11428 let mut filler = 0xFFE0_0000_u32;
11429 while pixels.len() < 80 {
11430 filler = filler.wrapping_add(1);
11431 pixels.push(filler);
11432 }
11433
11434 let greedy = tokenize_lz77_inner(&pixels, 0);
11435 let lazy1 = tokenize_lz77_inner(&pixels, 1);
11436 let lazy2 = tokenize_lz77_inner(&pixels, 2);
11437
11438 let copies = |toks: &[Token]| -> usize {
11439 toks.iter()
11440 .filter(|t| matches!(t, Token::Copy { .. }))
11441 .count()
11442 };
11443 let coverage = |toks: &[Token]| -> usize {
11444 toks.iter()
11445 .map(|t| match *t {
11446 Token::Literal(_) => 1,
11447 Token::CacheRef { .. } => 1,
11448 Token::Copy { length, .. } => length,
11449 })
11450 .sum()
11451 };
11452 // Sanity: all three partitions cover the exact image.
11453 assert_eq!(coverage(&greedy), pixels.len());
11454 assert_eq!(coverage(&lazy1), pixels.len());
11455 assert_eq!(coverage(&lazy2), pixels.len());
11456
11457 let g_c = copies(&greedy);
11458 let l1_c = copies(&lazy1);
11459 let l2_c = copies(&lazy2);
11460 eprintln!(
11461 "[round-157] depth-2 trap fixture: greedy tokens={} (copies={}), \
11462 depth-1 tokens={} (copies={}), depth-2 tokens={} (copies={}), \
11463 copy delta vs depth-1={}",
11464 greedy.len(),
11465 g_c,
11466 lazy1.len(),
11467 l1_c,
11468 lazy2.len(),
11469 l2_c,
11470 l1_c as i64 - l2_c as i64,
11471 );
11472
11473 // The trap forces depth-2 to collapse a length-4 copy into a
11474 // 2-literals + length-7 copy that subsumes 7 pixels of what
11475 // greedy / depth-1 would have to cover with multiple matches.
11476 // The structural win is on Copy count: depth-2 must emit
11477 // strictly fewer Copy tokens than BOTH baselines.
11478 assert_eq!(
11479 g_c, l1_c,
11480 "round-157 fixture: depth-1 must agree with greedy here \
11481 (no depth-1 swap fires) — greedy={g_c}, depth-1={l1_c}"
11482 );
11483 assert!(
11484 l2_c < l1_c,
11485 "round-157 depth-2 matcher must emit strictly fewer Copy \
11486 tokens than the depth-1 matcher on the depth-2 trap \
11487 fixture: depth-1 copies={l1_c} depth-2 copies={l2_c}\n\
11488 depth-1 partition: {lazy1:?}\n\
11489 depth-2 partition: {lazy2:?}"
11490 );
11491
11492 // Round-trip the bytes through the no-transform encoder for
11493 // good measure: the depth-2 path must decode back exactly.
11494 let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11495 let w = pixels.len() as u32;
11496 let h = 1u32;
11497 let header = build_image_header(w, h, true);
11498 let mut payload = header.to_vec();
11499 payload.extend_from_slice(&stream);
11500 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11501 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11502 assert_eq!(img.pixels(), pixels.as_slice());
11503 }
11504
11505 /// Round 157 non-regression: across a broad fixture matrix the
11506 /// depth-2 lazy token count is `<=` the depth-1 lazy token count
11507 /// everywhere. Structural because the depth-2 probe only swaps
11508 /// when the alternate match is strictly longer than the depth-1
11509 /// best, so the depth-2 partition uses at most as many tokens as
11510 /// the depth-1 partition. The test guards against off-by-one in
11511 /// the new depth-2 insert/skip dedup (where `pos` and `pos + 1`
11512 /// can both be pre-inserted before the chosen match starts at
11513 /// `pos`, `pos + 1`, or `pos + 2`).
11514 #[test]
11515 fn round_157_depth2_never_increases_token_count_over_depth1() {
11516 let shapes: &[(u32, u32)] = &[
11517 (16, 16),
11518 (20, 20),
11519 (24, 24),
11520 (32, 32),
11521 (48, 48),
11522 (16, 32),
11523 (64, 16),
11524 (40, 24),
11525 ];
11526 for &(w, h) in shapes {
11527 let gradient: Vec<u32> = (0..(w * h) as usize)
11528 .map(|i| {
11529 let x = (i as u32) % w;
11530 let y = (i as u32) / w;
11531 let g = (x + y) & 0xFF;
11532 0xFF00_0000 | (g << 16) | (g << 8) | g
11533 })
11534 .collect();
11535 let mut seed = 0xC0FFEE_u32;
11536 let noise: Vec<u32> = (0..(w * h) as usize)
11537 .map(|_| {
11538 seed ^= seed << 13;
11539 seed ^= seed >> 17;
11540 seed ^= seed << 5;
11541 0xFF00_0000 | (seed & 0x00FF_FFFF)
11542 })
11543 .collect();
11544 let stripes: Vec<u32> = (0..(w * h) as usize)
11545 .map(|i| {
11546 let x = (i as u32) % w;
11547 match x % 4 {
11548 0 => 0xFFAA_5500,
11549 1 => 0xFF55_AA00,
11550 2 => 0xFF00_55AA,
11551 _ => 0xFF55_00AA,
11552 }
11553 })
11554 .collect();
11555
11556 for (name, pixels) in [
11557 ("gradient", &gradient),
11558 ("noise", &noise),
11559 ("stripes", &stripes),
11560 ] {
11561 let lazy1 = tokenize_lz77_inner(pixels, 1);
11562 let lazy2 = tokenize_lz77_inner(pixels, 2);
11563 assert!(
11564 lazy2.len() <= lazy1.len(),
11565 "round-157 depth-2 regression on {name} {w}x{h}: \
11566 depth-1={} tokens, depth-2={} tokens",
11567 lazy1.len(),
11568 lazy2.len(),
11569 );
11570 // Round-trip the depth-2 stream as a defensive check
11571 // for hash-chain insert bookkeeping.
11572 let stream = encode_argb_literals_with_width(pixels, w);
11573 let header = build_image_header(w, h, true);
11574 let mut payload = header.to_vec();
11575 payload.extend_from_slice(&stream);
11576 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11577 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11578 assert_eq!(
11579 img.pixels(),
11580 pixels.as_slice(),
11581 "round-157 depth-2 round-trip mismatch on {name} {w}x{h}"
11582 );
11583 }
11584 }
11585 }
11586
11587 // ---- round 158: §5.2.2 three-position lazy LZ77 matching ---------
11588 //
11589 // The round-158 step extends the round-157 two-position lazy
11590 // matcher with a third look-ahead position. After finding a match
11591 // `(L_a, _)` at `pos` and (depth-1) probing `pos + 1` for a strictly
11592 // longer `L_b`, and (depth-2) probing `pos + 2` for a strictly
11593 // longer `L_c`, the matcher also (depth-3) probes `pos + 3` for an
11594 // `L_d > max(L_a, L_b, L_c)`. When the depth-3 probe wins, the
11595 // encoder emits three literals (`pixels[pos]`, `pixels[pos + 1]`,
11596 // and `pixels[pos + 2]`) and takes the longer match from `pos + 3`.
11597 // This recovers a *third-order* strict-greedy trap that the
11598 // round-157 depth-2 matcher could not escape — three consecutive
11599 // short matches at `pos`, `pos + 1`, `pos + 2` together blocking a
11600 // strictly longer match at `pos + 3`. The decoder output is
11601 // bit-identical for any input — only the token *partition* shifts
11602 // by up to three pixels — so round-trips remain bit-exact under
11603 // any input.
11604 //
11605 // Three contracts (mirroring the round-156 / round-157 layout):
11606 //
11607 // 1) Round-trip — every depth-3 lazy-matched stream still
11608 // round-trips end-to-end through `decode_lossless_image`.
11609 // 2) Strict-beat — on a hand-crafted depth-3-trap fixture, the
11610 // depth-3 matcher emits strictly fewer Copy tokens than the
11611 // strict-greedy, depth-1, and depth-2 matchers.
11612 // 3) Non-regression — on a broader fixture matrix the depth-3
11613 // token count is `<=` the depth-2 token count everywhere.
11614
11615 /// Round 158 round-trip: a noisy 96×16 fixture encoded with the
11616 /// round-158 depth-3 lazy matcher (now the production
11617 /// `tokenize_lz77` default) must still decode bit-exactly back to
11618 /// the original ARGB pixels. Uses an independent xorshift seed
11619 /// from the round-156 / round-157 tests so all three fixtures
11620 /// exercise the matcher over distinct entropy.
11621 #[test]
11622 fn round_158_depth3_lazy_match_round_trips_through_decoder() {
11623 let w = 96u32;
11624 let h = 16u32;
11625 let mut seed = 0xDEAD_BEEF_u32;
11626 let pixels: Vec<u32> = (0..(w * h) as usize)
11627 .map(|_| {
11628 seed ^= seed << 13;
11629 seed ^= seed >> 17;
11630 seed ^= seed << 5;
11631 0xFF00_0000 | (seed & 0x00FF_FFFF)
11632 })
11633 .collect();
11634
11635 // The full chooser delegates to `tokenize_lz77` (depth-3 as of
11636 // round 158); end-to-end round-trip through the framed file
11637 // must recover the exact input.
11638 let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11639 let header = build_image_header(w, h, true);
11640 let mut payload = header.to_vec();
11641 payload.extend_from_slice(&stream);
11642 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11643 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11644 assert_eq!(img.pixels(), pixels.as_slice());
11645
11646 // The direct depth-3 token stream against the no-transform
11647 // encoder must also round-trip — guards against bookkeeping
11648 // bugs in the new depth-3 insert/skip dedup path (where `pos`,
11649 // `pos + 1`, and `pos + 2` can all be pre-inserted before the
11650 // chosen match starts at `pos`, `pos + 1`, `pos + 2`, or
11651 // `pos + 3`).
11652 let stream_direct = encode_argb_literals_with_width(&pixels, w);
11653 let header_direct = build_image_header(w, h, true);
11654 let mut payload_direct = header_direct.to_vec();
11655 payload_direct.extend_from_slice(&stream_direct);
11656 let framed_direct =
11657 build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11658 let img_direct = crate::decode_lossless_image(&framed_direct)
11659 .unwrap()
11660 .unwrap();
11661 assert_eq!(img_direct.pixels(), pixels.as_slice());
11662 }
11663
11664 /// Round 158 strict-beat: a hand-crafted depth-3-trap fixture
11665 /// where the strict-greedy matcher, the round-156 depth-1 lazy
11666 /// matcher, AND the round-157 depth-2 lazy matcher all accept a
11667 /// short match at `pos` that prevents a strictly longer match at
11668 /// `pos + 3`. The depth-3 lazy matcher emits three literals and
11669 /// takes the longer match.
11670 ///
11671 /// Layout (each capital letter is a unique ARGB constant; the
11672 /// `Z*` family are unique separator pixels that share no 4-pixel
11673 /// window with the anchors):
11674 ///
11675 /// ```text
11676 /// pos 0..4 [P Q R S] — anchor A (4 px)
11677 /// pos 4..7 [Z1 Z2 Z3] — separator
11678 /// pos 7..11 [Q R S T] — anchor B (4 px)
11679 /// pos 11..14 [Z4 Z5 Z6] — separator
11680 /// pos 14..18 [R S T U] — anchor C (4 px)
11681 /// pos 18..21 [Z7 Z8 Z9] — separator
11682 /// pos 21..30 [S T U V W X Y A B] — anchor D (9 px)
11683 /// pos 30..33 [Z10 Z11 Z12] — separator
11684 /// pos 33 P — trap start
11685 /// pos 34 Q
11686 /// pos 35 R
11687 /// pos 36..45 [S T U V W X Y A B] — depth-3 chain region
11688 /// pos 45.. fill with unique Zfill colors (no 4-window match)
11689 /// ```
11690 ///
11691 /// At pos 33:
11692 ///
11693 /// * `find(33)` window `[P,Q,R,S]` → matches anchor A (pos 0),
11694 /// extension stops at length 4 because pos 4 (Z1) ≠ pos 37 (T).
11695 /// * `find(34)` window `[Q,R,S,T]` → matches anchor B (pos 7),
11696 /// extension stops at length 4 because pos 11 (Z4) ≠ pos 38 (U).
11697 /// `L_b = 4 = L_a`, **not strictly greater**, so the depth-1
11698 /// lazy matcher does NOT swap.
11699 /// * `find(35)` window `[R,S,T,U]` → matches anchor C (pos 14),
11700 /// extension stops at length 4 because pos 18 (Z7) ≠ pos 39 (V).
11701 /// `L_c = 4 = L_a`, **not strictly greater**, so the depth-2
11702 /// lazy matcher does NOT swap.
11703 /// * `find(36)` window `[S,T,U,V]` → matches anchor D (pos 21),
11704 /// extension goes the full `[S,T,U,V,W,X,Y,A,B]` (length 9)
11705 /// before pos 30 (Z10) ≠ pos 45 (Zfill). `L_d = 9 > 4`, so the
11706 /// depth-3 lazy matcher swaps to three literals + the length-9
11707 /// match.
11708 ///
11709 /// Strict-greedy, depth-1, AND depth-2 partition at the trap:
11710 /// `[Copy{4, dist=33}, ...]`. Depth-3 partition: `[Lit(P), Lit(Q),
11711 /// Lit(R), Copy{9, dist=15}, ...]`. Net: depth-3 collapses a
11712 /// short-then-short-then-short triple into one longer copy.
11713 #[test]
11714 fn round_158_depth3_lazy_match_strictly_beats_depth2_on_trap_fixture() {
11715 // Distinct ARGB constants. Anchor letters P..Y + A..B carry
11716 // the structural matches; Z1..Z12 + Zfill are deliberately
11717 // unique so they cannot seed a parasitic chain.
11718 let p_ = 0xFF11_2200_u32;
11719 let q_ = 0xFF22_3300_u32;
11720 let r_ = 0xFF33_4400_u32;
11721 let s_ = 0xFF44_5500_u32;
11722 let t_ = 0xFF55_6600_u32;
11723 let u_ = 0xFF66_7700_u32;
11724 let v_ = 0xFF77_8800_u32;
11725 let w_ = 0xFF88_9900_u32;
11726 let x_ = 0xFF99_AA00_u32;
11727 let y_ = 0xFFAA_BB00_u32;
11728 let a_ = 0xFFBB_CC00_u32;
11729 let b_ = 0xFFCC_DD00_u32;
11730 let z01 = 0xFFEE_0001_u32;
11731 let z02 = 0xFFEE_0002_u32;
11732 let z03 = 0xFFEE_0003_u32;
11733 let z04 = 0xFFEE_0004_u32;
11734 let z05 = 0xFFEE_0005_u32;
11735 let z06 = 0xFFEE_0006_u32;
11736 let z07 = 0xFFEE_0007_u32;
11737 let z08 = 0xFFEE_0008_u32;
11738 let z09 = 0xFFEE_0009_u32;
11739 let z10 = 0xFFEE_000A_u32;
11740 let z11 = 0xFFEE_000B_u32;
11741 let z12 = 0xFFEE_000C_u32;
11742
11743 let mut pixels: Vec<u32> = vec![
11744 p_, q_, r_, s_, // 0..4 anchor A
11745 z01, z02, z03, // 4..7 separator
11746 q_, r_, s_, t_, // 7..11 anchor B
11747 z04, z05, z06, // 11..14 separator
11748 r_, s_, t_, u_, // 14..18 anchor C
11749 z07, z08, z09, // 18..21 separator
11750 s_, t_, u_, v_, w_, x_, y_, a_, b_, // 21..30 anchor D (9 px)
11751 z10, z11, z12, // 30..33 separator
11752 p_, q_, r_, // 33..36 trap start (depth-1/2 cannot escape)
11753 s_, t_, u_, v_, w_, x_, y_, a_, b_, // 36..45 depth-3 chain region
11754 ];
11755 // Pad the tail with unique colors so the depth-3 swap's
11756 // post-match region cannot trigger another long match that
11757 // might mask the trap's copy-count delta.
11758 let mut filler = 0xFFF0_0000_u32;
11759 while pixels.len() < 96 {
11760 filler = filler.wrapping_add(1);
11761 pixels.push(filler);
11762 }
11763
11764 let greedy = tokenize_lz77_inner(&pixels, 0);
11765 let lazy1 = tokenize_lz77_inner(&pixels, 1);
11766 let lazy2 = tokenize_lz77_inner(&pixels, 2);
11767 let lazy3 = tokenize_lz77_inner(&pixels, 3);
11768
11769 let copies = |toks: &[Token]| -> usize {
11770 toks.iter()
11771 .filter(|t| matches!(t, Token::Copy { .. }))
11772 .count()
11773 };
11774 let coverage = |toks: &[Token]| -> usize {
11775 toks.iter()
11776 .map(|t| match *t {
11777 Token::Literal(_) => 1,
11778 Token::CacheRef { .. } => 1,
11779 Token::Copy { length, .. } => length,
11780 })
11781 .sum()
11782 };
11783 // Sanity: all four partitions cover the exact image.
11784 assert_eq!(coverage(&greedy), pixels.len());
11785 assert_eq!(coverage(&lazy1), pixels.len());
11786 assert_eq!(coverage(&lazy2), pixels.len());
11787 assert_eq!(coverage(&lazy3), pixels.len());
11788
11789 let g_c = copies(&greedy);
11790 let l1_c = copies(&lazy1);
11791 let l2_c = copies(&lazy2);
11792 let l3_c = copies(&lazy3);
11793 eprintln!(
11794 "[round-158] depth-3 trap fixture: greedy tokens={} (copies={}), \
11795 depth-1 tokens={} (copies={}), depth-2 tokens={} (copies={}), \
11796 depth-3 tokens={} (copies={}), copy delta vs depth-2={}",
11797 greedy.len(),
11798 g_c,
11799 lazy1.len(),
11800 l1_c,
11801 lazy2.len(),
11802 l2_c,
11803 lazy3.len(),
11804 l3_c,
11805 l2_c as i64 - l3_c as i64,
11806 );
11807
11808 // The trap forces depth-3 to collapse a length-4 copy + a
11809 // follow-on length-8 copy into a 3-literals + length-9 copy
11810 // that subsumes 12 pixels of what greedy / depth-1 / depth-2
11811 // would have to cover with two matches. The structural win
11812 // is on Copy count: depth-3 must emit strictly fewer Copy
11813 // tokens than all three baselines.
11814 assert_eq!(
11815 g_c, l1_c,
11816 "round-158 fixture: depth-1 must agree with greedy here \
11817 (no depth-1 swap fires) — greedy={g_c}, depth-1={l1_c}"
11818 );
11819 assert_eq!(
11820 g_c, l2_c,
11821 "round-158 fixture: depth-2 must agree with greedy here \
11822 (no depth-2 swap fires) — greedy={g_c}, depth-2={l2_c}"
11823 );
11824 assert!(
11825 l3_c < l2_c,
11826 "round-158 depth-3 matcher must emit strictly fewer Copy \
11827 tokens than the depth-2 matcher on the depth-3 trap \
11828 fixture: depth-2 copies={l2_c} depth-3 copies={l3_c}\n\
11829 depth-2 partition: {lazy2:?}\n\
11830 depth-3 partition: {lazy3:?}"
11831 );
11832
11833 // Round-trip the bytes through the no-transform encoder for
11834 // good measure: the depth-3 path must decode back exactly.
11835 let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11836 let w = pixels.len() as u32;
11837 let h = 1u32;
11838 let header = build_image_header(w, h, true);
11839 let mut payload = header.to_vec();
11840 payload.extend_from_slice(&stream);
11841 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11842 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11843 assert_eq!(img.pixels(), pixels.as_slice());
11844 }
11845
11846 /// Round 158 non-regression: across a broad fixture matrix the
11847 /// depth-3 lazy token count is `<=` the depth-2 lazy token count
11848 /// everywhere. Structural because the depth-3 probe only swaps
11849 /// when the alternate match is strictly longer than the depth-2
11850 /// best, so the depth-3 partition uses at most as many tokens as
11851 /// the depth-2 partition. The test guards against off-by-one in
11852 /// the new depth-3 insert/skip dedup (where `pos`, `pos + 1`, and
11853 /// `pos + 2` can all be pre-inserted before the chosen match
11854 /// starts at `pos`, `pos + 1`, `pos + 2`, or `pos + 3`).
11855 #[test]
11856 fn round_158_depth3_never_increases_token_count_over_depth2() {
11857 let shapes: &[(u32, u32)] = &[
11858 (16, 16),
11859 (20, 20),
11860 (24, 24),
11861 (32, 32),
11862 (48, 48),
11863 (16, 32),
11864 (64, 16),
11865 (40, 24),
11866 ];
11867 for &(w, h) in shapes {
11868 let gradient: Vec<u32> = (0..(w * h) as usize)
11869 .map(|i| {
11870 let x = (i as u32) % w;
11871 let y = (i as u32) / w;
11872 let g = (x + y) & 0xFF;
11873 0xFF00_0000 | (g << 16) | (g << 8) | g
11874 })
11875 .collect();
11876 let mut seed = 0xC0FFEE_u32;
11877 let noise: Vec<u32> = (0..(w * h) as usize)
11878 .map(|_| {
11879 seed ^= seed << 13;
11880 seed ^= seed >> 17;
11881 seed ^= seed << 5;
11882 0xFF00_0000 | (seed & 0x00FF_FFFF)
11883 })
11884 .collect();
11885 let stripes: Vec<u32> = (0..(w * h) as usize)
11886 .map(|i| {
11887 let x = (i as u32) % w;
11888 match x % 4 {
11889 0 => 0xFFAA_5500,
11890 1 => 0xFF55_AA00,
11891 2 => 0xFF00_55AA,
11892 _ => 0xFF55_00AA,
11893 }
11894 })
11895 .collect();
11896
11897 for (name, pixels) in [
11898 ("gradient", &gradient),
11899 ("noise", &noise),
11900 ("stripes", &stripes),
11901 ] {
11902 let lazy2 = tokenize_lz77_inner(pixels, 2);
11903 let lazy3 = tokenize_lz77_inner(pixels, 3);
11904 assert!(
11905 lazy3.len() <= lazy2.len(),
11906 "round-158 depth-3 regression on {name} {w}x{h}: \
11907 depth-2={} tokens, depth-3={} tokens",
11908 lazy2.len(),
11909 lazy3.len(),
11910 );
11911 // Round-trip the depth-3 stream as a defensive check
11912 // for hash-chain insert bookkeeping.
11913 let stream = encode_argb_literals_with_width(pixels, w);
11914 let header = build_image_header(w, h, true);
11915 let mut payload = header.to_vec();
11916 payload.extend_from_slice(&stream);
11917 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11918 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11919 assert_eq!(
11920 img.pixels(),
11921 pixels.as_slice(),
11922 "round-158 depth-3 round-trip mismatch on {name} {w}x{h}"
11923 );
11924 }
11925 }
11926 }
11927
11928 // ---- round 163: §5.2.2 guarded depth-4 lazy LZ77 ----
11929 //
11930 // Three tests, mirroring the round-156 / 157 / 158 contract:
11931 //
11932 // 1) End-to-end round-trip — a noisy 96×16 fixture encoded with
11933 // the round-163 guarded depth-4 lazy matcher (now the production
11934 // `tokenize_lz77` default) must still decode bit-exactly back
11935 // to the original ARGB pixels.
11936 // 2) Diminishing-returns guard — a hand-crafted fixture where the
11937 // depth-3 best at `pos` is a long run (`>= DEPTH4_GUARD_THRESHOLD`)
11938 // and a depth-4 swap candidate exists. The guard must suppress
11939 // the depth-4 work so depth-4 == depth-3 byte-for-byte on that
11940 // fixture; the unguarded depth-4 (simulated with `DEPTH4_GUARD_THRESHOLD`
11941 // set to `MAX_MATCH`) would have swapped. We exercise the
11942 // boundary by toggling the depth around the guard rather than
11943 // monkey-patching the constant — the two depth values that
11944 // bracket the guard (`3` vs `4`) produce identical partitions
11945 // on the long-run fixture, proving the guard suppressed the
11946 // probe.
11947 // 3) Non-regression — on a broader fixture matrix the depth-4
11948 // token count is `<=` the depth-3 token count everywhere
11949 // (structural: the depth-4 probe only swaps to a *strictly*
11950 // longer match, so it can only remove tokens, never add them).
11951
11952 /// Round 163 round-trip: a noisy 96×16 fixture encoded with the
11953 /// round-163 guarded depth-4 lazy matcher (now the production
11954 /// `tokenize_lz77` default) must still decode bit-exactly back to
11955 /// the original ARGB pixels. Uses an independent xorshift seed
11956 /// from the round-156 / 157 / 158 tests so all four fixtures
11957 /// exercise the matcher over distinct entropy.
11958 #[test]
11959 fn round_163_depth4_lazy_match_round_trips_through_decoder() {
11960 let w = 96u32;
11961 let h = 16u32;
11962 let mut seed = 0xFEED_FACE_u32;
11963 let pixels: Vec<u32> = (0..(w * h) as usize)
11964 .map(|_| {
11965 seed ^= seed << 13;
11966 seed ^= seed >> 17;
11967 seed ^= seed << 5;
11968 0xFF00_0000 | (seed & 0x00FF_FFFF)
11969 })
11970 .collect();
11971
11972 // The full chooser delegates to `tokenize_lz77` (depth-4 as of
11973 // round 163); end-to-end round-trip through the framed file
11974 // must recover the exact input.
11975 let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11976 let header = build_image_header(w, h, true);
11977 let mut payload = header.to_vec();
11978 payload.extend_from_slice(&stream);
11979 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11980 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11981 assert_eq!(img.pixels(), pixels.as_slice());
11982
11983 // The direct depth-4 token stream against the no-transform
11984 // encoder must also round-trip — guards against bookkeeping
11985 // bugs in the new depth-4 insert/skip dedup path (where `pos`,
11986 // `pos + 1`, `pos + 2`, and `pos + 3` can all be pre-inserted
11987 // before the chosen match starts at `pos`, `pos + 1`, `pos + 2`,
11988 // `pos + 3`, or `pos + 4`).
11989 let stream_direct = encode_argb_literals_with_width(&pixels, w);
11990 let header_direct = build_image_header(w, h, true);
11991 let mut payload_direct = header_direct.to_vec();
11992 payload_direct.extend_from_slice(&stream_direct);
11993 let framed_direct =
11994 build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11995 let img_direct = crate::decode_lossless_image(&framed_direct)
11996 .unwrap()
11997 .unwrap();
11998 assert_eq!(img_direct.pixels(), pixels.as_slice());
11999 }
12000
12001 /// Round 163 guard contract: on a fixture whose depth-3 best at
12002 /// some position is already a long run (length strictly `>=
12003 /// DEPTH4_GUARD_THRESHOLD`), the depth-4 probe MUST be suppressed
12004 /// by the guard. We construct an input where a long literal run
12005 /// at the start seeds a long match for the second copy. The
12006 /// depth-3 matcher emits a long match at the first probe; the
12007 /// depth-4 probe, if it were unguarded, would attempt a `find` at
12008 /// `pos + 4`. The guard's structural contract is that whenever
12009 /// the depth-3 best already covers `>= DEPTH4_GUARD_THRESHOLD`
12010 /// pixels, depth-4 produces the IDENTICAL token sequence as
12011 /// depth-3 — i.e. the guard fired and the depth-4 work was
12012 /// skipped.
12013 ///
12014 /// The simpler property the test asserts: on a long-run fixture
12015 /// the depth-4 partition (depth = 4) is byte-for-byte equal to
12016 /// the depth-3 partition (depth = 3). If the guard fails to fire,
12017 /// depth-4 would still find some marginal swap somewhere in the
12018 /// fixture and the two partitions would diverge.
12019 #[test]
12020 fn round_163_depth4_guard_suppresses_long_run_swap() {
12021 // A long, smoothly-varying run guarantees that almost every
12022 // match the matcher finds is significantly longer than
12023 // `DEPTH4_GUARD_THRESHOLD == 6` — so the guard should fire at
12024 // every probe site and depth-4 should produce the same token
12025 // partition as depth-3.
12026 //
12027 // We use a 4-pixel repeating motif that the matcher can find
12028 // long copies of after the first cycle: `[A, B, C, D, A, B, C,
12029 // D, …]`. After 12 pixels of warm-up, a `find` will return a
12030 // match length up to MAX_MATCH (well over the guard threshold).
12031 let a_ = 0xFF10_2030_u32;
12032 let b_ = 0xFF40_5060_u32;
12033 let c_ = 0xFF70_8090_u32;
12034 let d_ = 0xFFA0_B0C0_u32;
12035 let motif = [a_, b_, c_, d_];
12036 let mut pixels: Vec<u32> = Vec::with_capacity(512);
12037 for i in 0..512 {
12038 pixels.push(motif[i & 3]);
12039 }
12040
12041 let lazy3 = tokenize_lz77_inner(&pixels, 3);
12042 let lazy4 = tokenize_lz77_inner(&pixels, 4);
12043
12044 // Guard contract: when the depth-3 best is already long, the
12045 // depth-4 probe is suppressed and the two partitions are
12046 // byte-for-byte equal.
12047 assert_eq!(
12048 lazy3,
12049 lazy4,
12050 "round-163 depth-4 guard should suppress the depth-4 probe \
12051 on a long-run fixture (every depth-3 best `>= DEPTH4_GUARD_THRESHOLD == {}`), \
12052 producing the identical depth-3 partition; depth-3={} tokens, \
12053 depth-4={} tokens",
12054 DEPTH4_GUARD_THRESHOLD,
12055 lazy3.len(),
12056 lazy4.len(),
12057 );
12058
12059 // Sanity: both partitions must cover the input exactly.
12060 let coverage = |toks: &[Token]| -> usize {
12061 toks.iter()
12062 .map(|t| match *t {
12063 Token::Literal(_) => 1,
12064 Token::CacheRef { .. } => 1,
12065 Token::Copy { length, .. } => length,
12066 })
12067 .sum()
12068 };
12069 assert_eq!(coverage(&lazy3), pixels.len());
12070 assert_eq!(coverage(&lazy4), pixels.len());
12071
12072 // End-to-end round-trip via the production chooser for good
12073 // measure: the depth-4-default `tokenize_lz77` must still
12074 // decode back exactly on this long-run fixture.
12075 let w = pixels.len() as u32;
12076 let h = 1u32;
12077 let stream = encode_argb_literals_with_width(&pixels, w);
12078 let header = build_image_header(w, h, true);
12079 let mut payload = header.to_vec();
12080 payload.extend_from_slice(&stream);
12081 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12082 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12083 assert_eq!(img.pixels(), pixels.as_slice());
12084 }
12085
12086 /// Round 163 non-regression: across a broad fixture matrix the
12087 /// depth-4 lazy token count is `<=` the depth-3 lazy token count
12088 /// everywhere. Structural because the depth-4 probe — when the
12089 /// guard allows it to fire — only swaps when the alternate match
12090 /// is strictly longer than the depth-3 best, so the depth-4
12091 /// partition uses at most as many tokens as the depth-3 partition.
12092 /// When the guard suppresses the probe, depth-4 produces the same
12093 /// tokens as depth-3 directly. The test also guards against
12094 /// off-by-one in the new depth-4 insert/skip dedup (where `pos`,
12095 /// `pos + 1`, `pos + 2`, and `pos + 3` can all be pre-inserted
12096 /// before the chosen match starts at any of those positions or
12097 /// `pos + 4`).
12098 #[test]
12099 fn round_163_depth4_never_increases_token_count_over_depth3() {
12100 let shapes: &[(u32, u32)] = &[
12101 (16, 16),
12102 (20, 20),
12103 (24, 24),
12104 (32, 32),
12105 (48, 48),
12106 (16, 32),
12107 (64, 16),
12108 (40, 24),
12109 ];
12110 for &(w, h) in shapes {
12111 let gradient: Vec<u32> = (0..(w * h) as usize)
12112 .map(|i| {
12113 let x = (i as u32) % w;
12114 let y = (i as u32) / w;
12115 let g = (x + y) & 0xFF;
12116 0xFF00_0000 | (g << 16) | (g << 8) | g
12117 })
12118 .collect();
12119 let mut seed = 0xBADD_CAFE_u32;
12120 let noise: Vec<u32> = (0..(w * h) as usize)
12121 .map(|_| {
12122 seed ^= seed << 13;
12123 seed ^= seed >> 17;
12124 seed ^= seed << 5;
12125 0xFF00_0000 | (seed & 0x00FF_FFFF)
12126 })
12127 .collect();
12128 let stripes: Vec<u32> = (0..(w * h) as usize)
12129 .map(|i| {
12130 let x = (i as u32) % w;
12131 match x % 4 {
12132 0 => 0xFFAA_5500,
12133 1 => 0xFF55_AA00,
12134 2 => 0xFF00_55AA,
12135 _ => 0xFF55_00AA,
12136 }
12137 })
12138 .collect();
12139
12140 for (name, pixels) in [
12141 ("gradient", &gradient),
12142 ("noise", &noise),
12143 ("stripes", &stripes),
12144 ] {
12145 let lazy3 = tokenize_lz77_inner(pixels, 3);
12146 let lazy4 = tokenize_lz77_inner(pixels, 4);
12147 assert!(
12148 lazy4.len() <= lazy3.len(),
12149 "round-163 depth-4 regression on {name} {w}x{h}: \
12150 depth-3={} tokens, depth-4={} tokens",
12151 lazy3.len(),
12152 lazy4.len(),
12153 );
12154 // Round-trip the depth-4 stream as a defensive check
12155 // for hash-chain insert bookkeeping.
12156 let stream = encode_argb_literals_with_width(pixels, w);
12157 let header = build_image_header(w, h, true);
12158 let mut payload = header.to_vec();
12159 payload.extend_from_slice(&stream);
12160 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12161 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12162 assert_eq!(
12163 img.pixels(),
12164 pixels.as_slice(),
12165 "round-163 depth-4 round-trip mismatch on {name} {w}x{h}"
12166 );
12167 }
12168 }
12169 }
12170
12171 // ---- round 159: §4.1 entropy-image-aware tie-break ----
12172
12173 /// `pick_block_mode_with_hint` accepts the preferred neighbour
12174 /// mode when it ties with the otherwise-lowest mode at the same
12175 /// minimal residual cost. The block is a solid-colour fill, so
12176 /// modes 1..=13 all predict the left/top neighbour exactly →
12177 /// every interior pixel has zero residual, and ties run across
12178 /// every mode whose residual sum equals the lowest sum found.
12179 /// Without a hint the chooser picks the lowest mode (mode 1 on a
12180 /// non-black solid); with a hint of `Some(7)` it returns mode 7.
12181 #[test]
12182 fn round_159_pick_block_mode_with_hint_swaps_on_tie() {
12183 let w = 8usize;
12184 let h = 8usize;
12185 let pixels = vec![0xff50_6070u32; w * h];
12186
12187 // No hint: the lowest tied mode wins (deterministic baseline).
12188 let baseline = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, None);
12189 // The exact value depends on the border rule for mode 0 vs
12190 // the per-channel residual; what matters here is that the
12191 // hint can swap to a different mode that ties at the same
12192 // cost.
12193 let baseline_cost = block_mode_cost(&pixels, w, h, 0, 0, w, h, baseline);
12194
12195 // Probe every mode 0..=13 to find one that ties baseline but
12196 // is not equal to it.
12197 let mut tied_other: Option<u8> = None;
12198 for m in 0u8..=13 {
12199 if m == baseline {
12200 continue;
12201 }
12202 let c = block_mode_cost(&pixels, w, h, 0, 0, w, h, m);
12203 if c == baseline_cost {
12204 tied_other = Some(m);
12205 break;
12206 }
12207 }
12208 let other = tied_other
12209 .expect("a solid-fill block has at least two modes tied at minimal residual cost");
12210
12211 // With hint == Some(other) and `other` strictly distinct
12212 // from `baseline` but tied at the same cost, the chooser
12213 // must return `other`.
12214 let with_hint = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, Some(other));
12215 assert_eq!(
12216 with_hint, other,
12217 "round-159 tie-break did not adopt the preferred mode: \
12218 baseline={baseline}, other={other}, returned={with_hint}"
12219 );
12220 }
12221
12222 /// `pick_block_mode_with_hint` does NOT swap when the preferred
12223 /// mode is strictly worse than the cost-minimal mode. A diagonal
12224 /// 2-D ramp `pixels[y, x] = (x + 2y) & 0xff` makes the L-based
12225 /// modes pay residual `1` per pixel while the T-based modes pay
12226 /// residual `2` per pixel, so the chooser picks an L-based mode
12227 /// uniquely. Probing every mode confirms which one is strictly
12228 /// worse than the picked baseline; with that mode as the hint
12229 /// the chooser must still return the baseline.
12230 #[test]
12231 fn round_159_pick_block_mode_with_hint_keeps_best_when_hint_worse() {
12232 let w = 16usize;
12233 let h = 16usize;
12234 // 2-D ramp: L-based modes pay 1/pixel; T-based modes pay 2/pixel.
12235 let pixels: Vec<u32> = (0..(w * h))
12236 .map(|i| {
12237 let x = (i % w) as u32;
12238 let y = (i / w) as u32;
12239 let v = (x + 2 * y) & 0xff;
12240 0xff00_0000 | (v << 16) | (v << 8) | v
12241 })
12242 .collect();
12243
12244 let baseline = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, None);
12245 let baseline_cost = block_mode_cost(&pixels, w, h, 0, 0, w, h, baseline);
12246 // Find any mode whose cost is strictly worse than baseline.
12247 let mut worse: Option<u8> = None;
12248 for m in 0u8..=13 {
12249 let c = block_mode_cost(&pixels, w, h, 0, 0, w, h, m);
12250 if c > baseline_cost {
12251 worse = Some(m);
12252 break;
12253 }
12254 }
12255 let worse = worse
12256 .expect("test premise: the 2-D ramp should produce at least one strictly-worse mode");
12257 let with_hint = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, Some(worse));
12258 assert_eq!(
12259 with_hint, baseline,
12260 "round-159 tie-break must not adopt a strictly-worse hint \
12261 (baseline={baseline}, worse-hint={worse})"
12262 );
12263 }
12264
12265 /// Local pre-round-159 copy of `build_predictor_image`. Mirrors
12266 /// the round-158 behaviour exactly: every block calls the
12267 /// hint-aware chooser with `prefer_mode = None`, so ties resolve
12268 /// to the lowest mode regardless of any spatial coherence. Used
12269 /// by the round-159 non-regression and strict-beat tests as the
12270 /// before-after baseline.
12271 fn pre_round_159_build_predictor_image(
12272 pixels: &[u32],
12273 width: u32,
12274 height: u32,
12275 size_bits: u8,
12276 ) -> (Vec<u32>, u32, u32) {
12277 let block = 1u32 << size_bits;
12278 let tw = predictor_div_round_up(width, block);
12279 let th = predictor_div_round_up(height, block);
12280 let mut img = Vec::with_capacity((tw * th) as usize);
12281 let w = width as usize;
12282 let h = height as usize;
12283 let bsz = block as usize;
12284 for by in 0..th as usize {
12285 for bx in 0..tw as usize {
12286 let x0 = bx * bsz;
12287 let y0 = by * bsz;
12288 let mode = pick_block_mode_with_hint(pixels, w, h, x0, y0, bsz, bsz, None);
12289 img.push(0xff00_0000 | ((mode as u32) << 8));
12290 }
12291 }
12292 (img, tw, th)
12293 }
12294
12295 /// Round 159 structural correctness: the entropy-image-aware
12296 /// tie-break is residual-cost-neutral, so for *every* block the
12297 /// post-r159 chosen mode has identical residual cost to the
12298 /// pre-r159 chosen mode (only the mode *value* may differ on
12299 /// ties). The check is per-block: across a fixture matrix the
12300 /// summed per-block residual cost must be exactly equal under
12301 /// the two choosers.
12302 #[test]
12303 fn round_159_predictor_image_tie_break_is_cost_neutral() {
12304 let shapes: &[(u32, u32, u8)] = &[
12305 (32, 32, 4),
12306 (48, 48, 4),
12307 (64, 32, 4),
12308 (32, 64, 4),
12309 (24, 24, 3),
12310 ];
12311 for &(w, h, size_bits) in shapes {
12312 // Two fixtures: smooth gradient (many ties on flat regions
12313 // between modes 1/2/3 etc.) and palette-ish stripes
12314 // (column-aligned ties between L-based modes).
12315 let gradient: Vec<u32> = (0..(w * h) as usize)
12316 .map(|i| {
12317 let x = (i as u32) % w;
12318 let y = (i as u32) / w;
12319 let g = (x + y) & 0x0F;
12320 0xFF00_0000 | (g << 16) | (g << 8) | g
12321 })
12322 .collect();
12323 let stripes: Vec<u32> = (0..(w * h) as usize)
12324 .map(|i| {
12325 let x = (i as u32) % w;
12326 match x % 4 {
12327 0 => 0xFFAA_5500,
12328 1 => 0xFF55_AA00,
12329 2 => 0xFF00_55AA,
12330 _ => 0xFF55_00AA,
12331 }
12332 })
12333 .collect();
12334
12335 for (name, pixels) in [("gradient", &gradient), ("stripes", &stripes)] {
12336 let (pre_img, _, _) = pre_round_159_build_predictor_image(pixels, w, h, size_bits);
12337 let (post_img, _, _) = build_predictor_image(pixels, w, h, size_bits);
12338 assert_eq!(
12339 pre_img.len(),
12340 post_img.len(),
12341 "pre/post mode-image length differs on {name} {w}x{h} size_bits={size_bits}"
12342 );
12343 let block = 1u32 << size_bits;
12344 let tw = predictor_div_round_up(w, block) as usize;
12345 let bsz = block as usize;
12346 let wu = w as usize;
12347 let hu = h as usize;
12348 for (idx, (pre_px, post_px)) in pre_img.iter().zip(post_img.iter()).enumerate() {
12349 let bx = idx % tw;
12350 let by = idx / tw;
12351 let x0 = bx * bsz;
12352 let y0 = by * bsz;
12353 let pre_mode = ((pre_px >> 8) & 0xff) as u8;
12354 let post_mode = ((post_px >> 8) & 0xff) as u8;
12355 let pre_cost = block_mode_cost(pixels, wu, hu, x0, y0, bsz, bsz, pre_mode);
12356 let post_cost = block_mode_cost(pixels, wu, hu, x0, y0, bsz, bsz, post_mode);
12357 assert_eq!(
12358 pre_cost, post_cost,
12359 "round-159 tie-break changed residual cost on {name} {w}x{h} \
12360 block=({bx},{by}): pre mode {pre_mode} cost {pre_cost}, \
12361 post mode {post_mode} cost {post_cost}"
12362 );
12363 }
12364 }
12365 }
12366 }
12367
12368 /// Round 159 non-regression: across a fixture matrix the
12369 /// post-r159 predictor-chooser stream must never be longer than
12370 /// the pre-r159 stream. Since the tie-break is a strict subset
12371 /// of the pre-r159 candidate space (the chosen mode is always a
12372 /// cost-minimal mode under both choosers), the residual stream
12373 /// is identical and only the predictor sub-image's entropy can
12374 /// differ. The standalone chooser is invoked end-to-end through
12375 /// the lossless decoder to confirm round-trips on every fixture.
12376 #[test]
12377 fn round_159_predictor_chooser_never_regresses() {
12378 let shapes: &[(u32, u32)] = &[(16, 16), (24, 24), (32, 32), (48, 48), (32, 16), (24, 40)];
12379 for &(w, h) in shapes {
12380 let gradient: Vec<u32> = (0..(w * h) as usize)
12381 .map(|i| {
12382 let x = (i as u32) % w;
12383 let y = (i as u32) / w;
12384 let g = (x + y) & 0x0F;
12385 0xFF00_0000 | (g << 16) | (g << 8) | g
12386 })
12387 .collect();
12388 let stripes: Vec<u32> = (0..(w * h) as usize)
12389 .map(|i| {
12390 let x = (i as u32) % w;
12391 match x % 4 {
12392 0 => 0xFFAA_5500,
12393 1 => 0xFF55_AA00,
12394 2 => 0xFF00_55AA,
12395 _ => 0xFF55_00AA,
12396 }
12397 })
12398 .collect();
12399 let mut seed = 0xDEAD_BEEFu32;
12400 let noise: Vec<u32> = (0..(w * h) as usize)
12401 .map(|_| {
12402 seed ^= seed << 13;
12403 seed ^= seed >> 17;
12404 seed ^= seed << 5;
12405 0xFF00_0000 | (seed & 0x000F_0F0F)
12406 })
12407 .collect();
12408
12409 for (name, pixels) in [
12410 ("gradient", &gradient),
12411 ("stripes", &stripes),
12412 ("low-noise", &noise),
12413 ] {
12414 // Encode under the production chooser (with r159 tie-break).
12415 let post = encode_argb_with_predictor_chooser(pixels, w, h);
12416 // Decode round-trip — strict invariant.
12417 let header = build_image_header(w, h, true);
12418 let mut payload = header.to_vec();
12419 payload.extend_from_slice(&post);
12420 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12421 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12422 assert_eq!(
12423 img.pixels(),
12424 pixels.as_slice(),
12425 "round-159 round-trip mismatch on {name} {w}x{h}"
12426 );
12427 // Non-regression: the chooser's output with the
12428 // r159 hint must be no larger than the chooser with
12429 // the hint stubbed out. Since the hint is a strict
12430 // tie-break (same residual cost), the residual
12431 // stream is identical; only the predictor sub-image
12432 // can change, and it changes in the entropy-
12433 // reducing direction (so the writer emits fewer
12434 // bytes for it).
12435 let pre = encode_argb_with_predictor_chooser_no_r159_hint(pixels, w, h);
12436 assert!(
12437 post.len() <= pre.len(),
12438 "round-159 chooser regressed on {name} {w}x{h}: \
12439 pre={} B post={} B",
12440 pre.len(),
12441 post.len(),
12442 );
12443 }
12444 }
12445 }
12446
12447 /// Round 159 structural strict-beat: across a sweep of
12448 /// perturbation seeds, at least one fixture must reach a
12449 /// strictly more-uniform predictor sub-image under the r159
12450 /// hint-aware chooser than under the no-hint baseline — i.e.
12451 /// the mode-image's distinct-mode count drops by at least 1.
12452 /// The sweep verifies the entropy-image-aware tie-break
12453 /// actually fires on realistic small fixtures and reports the
12454 /// byte delta in the §4.1 predictor candidate's output for the
12455 /// first such fixture.
12456 ///
12457 /// Operates on `encode_with_predictor` directly (vs the full
12458 /// chooser) so the savings aren't masked by a competing
12459 /// candidate winning the chooser.
12460 #[test]
12461 fn round_159_predictor_candidate_strictly_beats_no_hint_on_some_fixture() {
12462 let w = 48u32;
12463 let h = 48u32;
12464 let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12465 let mut found_strict_image = false;
12466 let mut found_strict_bytes = false;
12467 let mut best_savings: i64 = 0;
12468 let mut seed_winner: u32 = 0;
12469 for seed_init in [
12470 0xCAFE_BABEu32,
12471 0xC0FFEE00,
12472 0xDEAD_BEEF,
12473 0xFACE_F00D,
12474 0xFEED_F00D,
12475 0x1234_5678,
12476 0xABCD_1234,
12477 0x90AB_CDEF,
12478 0x5A5A_5A5A,
12479 0xA5A5_A5A5,
12480 0xBA5E_BA11,
12481 0xB16B_00B5,
12482 ] {
12483 // Solid-fill canvas with a small perturbed region.
12484 // Vary the perturbation extent so different fixtures
12485 // trigger different mode-image patterns.
12486 let solid = 0xff60_8050u32;
12487 let mut pixels = vec![solid; (w * h) as usize];
12488 let mut s = seed_init;
12489 // 8×8 perturbation in the top-left so the right /
12490 // bottom neighbours' left-/top-column reads stay
12491 // mostly on solid pixels.
12492 for y in 0..8u32 {
12493 for x in 0..8u32 {
12494 s ^= s << 13;
12495 s ^= s >> 17;
12496 s ^= s << 5;
12497 let v = (s & 0x0007_0707) | 0xFF00_0000;
12498 pixels[(y * w + x) as usize] = v;
12499 }
12500 }
12501 let (pre_img, _, _) = pre_round_159_build_predictor_image(&pixels, w, h, size_bits);
12502 let (post_img, _, _) = build_predictor_image(&pixels, w, h, size_bits);
12503 let pre_modes: Vec<u8> = pre_img.iter().map(|p| ((p >> 8) & 0xff) as u8).collect();
12504 let post_modes: Vec<u8> = post_img.iter().map(|p| ((p >> 8) & 0xff) as u8).collect();
12505 let pre_distinct: std::collections::BTreeSet<u8> = pre_modes.iter().copied().collect();
12506 let post_distinct: std::collections::BTreeSet<u8> =
12507 post_modes.iter().copied().collect();
12508 if post_distinct.len() < pre_distinct.len() {
12509 found_strict_image = true;
12510 // Encode the predictor candidate under both
12511 // variants and check the byte delta.
12512 let post = encode_with_predictor(&pixels, w, h, size_bits, None, w);
12513 let pre = encode_with_predictor_no_r159_hint(&pixels, w, h, size_bits, None, w);
12514 let saved = pre.len() as i64 - post.len() as i64;
12515 if saved > best_savings {
12516 best_savings = saved;
12517 seed_winner = seed_init;
12518 }
12519 if post.len() < pre.len() {
12520 found_strict_bytes = true;
12521 // Round-trip the post stream end-to-end.
12522 let header = build_image_header(w, h, true);
12523 let mut payload = header.to_vec();
12524 payload.extend_from_slice(&post);
12525 let framed =
12526 build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12527 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12528 assert_eq!(
12529 img.pixels(),
12530 pixels.as_slice(),
12531 "round-159 strict-beat predictor candidate round-trip mismatch on \
12532 seed=0x{seed_init:08x}"
12533 );
12534 eprintln!(
12535 "[round-159] strict-beat predictor candidate: seed=0x{seed_init:08x}, \
12536 pre modes={pre_modes:?} post modes={post_modes:?} (distinct \
12537 pre={} post={}), pre={} B post={} B, saved={saved} B",
12538 pre_distinct.len(),
12539 post_distinct.len(),
12540 pre.len(),
12541 post.len(),
12542 );
12543 }
12544 // Non-regression always holds (residual cost is the
12545 // same under the tie-break, so the encoded bytes
12546 // can never increase).
12547 assert!(
12548 post.len() <= pre.len(),
12549 "round-159 tie-break regressed on seed=0x{seed_init:08x}: \
12550 pre={} B post={} B",
12551 pre.len(),
12552 post.len(),
12553 );
12554 }
12555 }
12556 assert!(
12557 found_strict_image,
12558 "round-159 sweep did not produce a single strictly-more-uniform mode image \
12559 — the hint propagation never fired across the fixture set"
12560 );
12561 assert!(
12562 found_strict_bytes,
12563 "round-159 sweep found a strict mode-image reduction but never a strict byte \
12564 reduction; entropy savings stayed within the LSB packing slack \
12565 (best_savings={best_savings} on seed=0x{seed_winner:08x})"
12566 );
12567 }
12568
12569 /// Local pre-round-159 copy of `encode_argb_with_predictor_chooser`
12570 /// that forces every predictor-image build to use the no-hint
12571 /// chooser. Used by `round_159_predictor_chooser_never_regresses`
12572 /// as the before-after baseline. The chooser's other candidate
12573 /// paths (no-tx, subtract-green, color-transform, color-indexing,
12574 /// meta-prefix) are re-used verbatim — only the predictor
12575 /// candidate is swapped for the no-hint variant.
12576 fn encode_argb_with_predictor_chooser_no_r159_hint(
12577 pixels: &[u32],
12578 width: u32,
12579 height: u32,
12580 ) -> Vec<u8> {
12581 let mut best = encode_argb_literals_with_width(pixels, width);
12582
12583 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12584 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
12585 let pred_block = 1u32 << pred_size_bits;
12586 let ctx_block = 1u32 << ctx_size_bits;
12587
12588 if width >= pred_block && height >= pred_block {
12589 let mut pred_single_block_size_bits: u8 = pred_size_bits;
12590 while pred_single_block_size_bits < 9
12591 && ((1u32 << pred_single_block_size_bits) < width
12592 || (1u32 << pred_single_block_size_bits) < height)
12593 {
12594 pred_single_block_size_bits += 1;
12595 }
12596 let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
12597 let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
12598 encode_with_predictor_no_r159_hint(
12599 pixels,
12600 width,
12601 height,
12602 pred_size_bits,
12603 cache_bits,
12604 width,
12605 )
12606 })];
12607 if try_pred_single_block {
12608 pred_candidates.push(select_best_cache_bits(|cache_bits| {
12609 encode_with_predictor_no_r159_hint(
12610 pixels,
12611 width,
12612 height,
12613 pred_single_block_size_bits,
12614 cache_bits,
12615 width,
12616 )
12617 }));
12618 }
12619 for cand in pred_candidates {
12620 if cand.len() < best.len() {
12621 best = cand;
12622 }
12623 }
12624 }
12625
12626 if width >= ctx_block && height >= ctx_block {
12627 let mut single_block_size_bits: u8 = ctx_size_bits;
12628 while single_block_size_bits < 9
12629 && ((1u32 << single_block_size_bits) < width
12630 || (1u32 << single_block_size_bits) < height)
12631 {
12632 single_block_size_bits += 1;
12633 }
12634 let try_single_block = single_block_size_bits != ctx_size_bits;
12635 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
12636 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
12637 })];
12638 if try_single_block {
12639 candidates.push(select_best_cache_bits(|cache_bits| {
12640 encode_with_color_transform(
12641 pixels,
12642 width,
12643 height,
12644 single_block_size_bits,
12645 cache_bits,
12646 width,
12647 )
12648 }));
12649 }
12650 for cand in candidates {
12651 if cand.len() < best.len() {
12652 best = cand;
12653 }
12654 }
12655 }
12656
12657 if collect_palette(pixels).is_some() {
12658 let ci_best = select_best_cache_bits(|cache_bits| {
12659 encode_with_color_indexing(pixels, width, height, cache_bits)
12660 .expect("palette feasibility already confirmed")
12661 });
12662 if ci_best.len() < best.len() {
12663 best = ci_best;
12664 }
12665 }
12666
12667 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
12668 if mp_best.len() < best.len() {
12669 best = mp_best;
12670 }
12671 }
12672
12673 best
12674 }
12675
12676 /// Local pre-round-159 copy of `encode_with_predictor` — same
12677 /// shape, but builds the predictor sub-image via the no-hint
12678 /// chooser (`pre_round_159_build_predictor_image`) so the
12679 /// before-after comparison isolates exactly the round-159
12680 /// tie-break change.
12681 fn encode_with_predictor_no_r159_hint(
12682 pixels: &[u32],
12683 width: u32,
12684 height: u32,
12685 size_bits: u8,
12686 cache_code_bits: Option<u32>,
12687 image_width: u32,
12688 ) -> Vec<u8> {
12689 let mut w = BitWriter::new();
12690 w.write_bit(true);
12691 w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
12692 debug_assert!((2..=9).contains(&size_bits));
12693 w.write_bits((size_bits - 2) as u32, 3);
12694 let (predictor_image, tw, _th) =
12695 pre_round_159_build_predictor_image(pixels, width, height, size_bits);
12696 write_entropy_coded_image_literals(&mut w, &predictor_image);
12697 w.write_bit(false);
12698 let mut residuals = vec![0u32; pixels.len()];
12699 apply_forward_predictor(
12700 pixels,
12701 &mut residuals,
12702 width,
12703 height,
12704 &predictor_image,
12705 tw,
12706 size_bits,
12707 );
12708 let mut tokens = tokenize_lz77(&residuals);
12709 if let Some(bits) = cache_code_bits {
12710 tokens = cacheify_tokens(&tokens, &residuals, bits);
12711 }
12712 write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
12713 w.into_bytes()
12714 }
12715
12716 // ---- round-160 §4.1 slack-cost tie-break tests ------------------
12717
12718 /// Round 160 hint-aware chooser contract (slack form): given a
12719 /// preferred mode whose residual cost is **within `slack`** of
12720 /// the otherwise-best cost, the chooser returns the preferred
12721 /// mode rather than the lowest-tied (or lowest-best) mode.
12722 /// Constructs a small 4×4 block with carefully-chosen
12723 /// per-channel values such that the lowest-best mode is 0
12724 /// (Black) but a non-trivial L-based mode has cost only one
12725 /// magnitude unit higher; the slack=1 chooser must select the
12726 /// preferred mode.
12727 #[test]
12728 fn round_160_pick_block_mode_with_hint_slack_swaps_within_budget() {
12729 // Solid-fill 4×4: every mode 1..=13 ties at zero residual
12730 // cost across the block interior; mode 0 (Black) gives a
12731 // strictly larger cost (the solid color is far from black).
12732 // The slack-cost chooser with `prefer = Some(7)` and slack
12733 // >= 0 must select mode 7 (the preferred tied mode), and
12734 // the strict-tie chooser must agree.
12735 let solid = 0xff60_8050u32;
12736 let pixels: Vec<u32> = vec![solid; 16];
12737 let strict = pick_block_mode_with_hint(&pixels, 4, 4, 0, 0, 4, 4, Some(7));
12738 let slack0 = pick_block_mode_with_hint_slack(&pixels, 4, 4, 0, 0, 4, 4, Some(7), 0);
12739 assert_eq!(
12740 strict, slack0,
12741 "slack=0 must be byte-identical to the round-159 strict tie-break"
12742 );
12743 assert_eq!(
12744 slack0, 7,
12745 "preferred tied mode must win on slack=0 when cost is equal"
12746 );
12747
12748 // Now construct a block where mode 0 has cost 0 (strictly
12749 // best) and another mode has small positive cost. The slack
12750 // chooser at sufficiently-large slack must swap to the
12751 // preferred mode; at slack=0 it must keep mode 0.
12752 //
12753 // Choose a 2×2 block of solid black (all zeros). The Black
12754 // predictor returns 0 (matches), and every other mode that
12755 // predicts from a neighbour also returns 0 (neighbours are
12756 // solid black). So *every* mode has cost 0 — not the
12757 // shape we want.
12758 //
12759 // Instead, place the test block inside a larger fixture so
12760 // that the block's *neighbour* pixels (above/left) differ
12761 // and force the L/T/etc. modes to non-zero cost while
12762 // Black mode stays at 0.
12763 //
12764 // 8×8 fixture: top half black, bottom half a non-zero
12765 // colour. Place the test block at (0, 4) — the row of
12766 // pixels above is the boundary between black (y=3) and
12767 // colour (y=4), so the T mode reads the row-3 black pixels
12768 // while the block itself is non-zero → T mode has non-zero
12769 // cost. The Black mode is `pred = 0` everywhere → cost is
12770 // the sum-magnitudes of the block's non-zero pixels.
12771 let mut big = vec![0xff00_0000u32; 64];
12772 for y in 4..8u32 {
12773 for x in 0..8u32 {
12774 big[(y * 8 + x) as usize] = 0xff01_0101;
12775 }
12776 }
12777 let best_default = pick_block_mode_with_hint(&big, 8, 8, 0, 4, 4, 4, None);
12778 let best_cost = block_mode_cost(&big, 8, 8, 0, 4, 4, 4, best_default);
12779
12780 // Pick a non-best mode and find its cost.
12781 let mut preferred: u8 = u8::MAX;
12782 let mut pref_cost: u64 = u64::MAX;
12783 for m in 0u8..=13 {
12784 if m == best_default {
12785 continue;
12786 }
12787 let c = block_mode_cost(&big, 8, 8, 0, 4, 4, 4, m);
12788 if c > best_cost && c < pref_cost {
12789 preferred = m;
12790 pref_cost = c;
12791 }
12792 }
12793 if preferred != u8::MAX {
12794 let extra = pref_cost - best_cost;
12795 // Strict tie-break must keep the best mode (cost
12796 // mismatch).
12797 let strict = pick_block_mode_with_hint(&big, 8, 8, 0, 4, 4, 4, Some(preferred));
12798 assert_eq!(
12799 strict, best_default,
12800 "strict round-159 tie-break must NOT swap when costs differ"
12801 );
12802 // Slack = extra - 1 must also keep the best mode.
12803 if extra > 0 {
12804 let slack_too_small = pick_block_mode_with_hint_slack(
12805 &big,
12806 8,
12807 8,
12808 0,
12809 4,
12810 4,
12811 4,
12812 Some(preferred),
12813 extra - 1,
12814 );
12815 assert_eq!(
12816 slack_too_small, best_default,
12817 "slack < (pref_cost - best_cost) must NOT swap"
12818 );
12819 }
12820 // Slack = extra must now allow the swap.
12821 let slack_exact =
12822 pick_block_mode_with_hint_slack(&big, 8, 8, 0, 4, 4, 4, Some(preferred), extra);
12823 assert_eq!(
12824 slack_exact, preferred,
12825 "slack >= (pref_cost - best_cost) must accept the preferred mode swap"
12826 );
12827 }
12828 }
12829
12830 /// Round 160 strict round-159 equivalence: with `slack = 0` the
12831 /// slack-cost chooser must produce byte-identical predictor
12832 /// sub-images and byte-identical encoded streams to the
12833 /// round-159 strict-tie-break baseline, across a fixture
12834 /// matrix.
12835 #[test]
12836 fn round_160_slack_zero_matches_round_159_baseline() {
12837 let shapes: &[(u32, u32, u8)] = &[
12838 (32, 32, 4),
12839 (48, 48, 4),
12840 (64, 32, 4),
12841 (32, 64, 4),
12842 (24, 24, 3),
12843 ];
12844 for &(w, h, size_bits) in shapes {
12845 let gradient: Vec<u32> = (0..(w * h) as usize)
12846 .map(|i| {
12847 let x = (i as u32) % w;
12848 let y = (i as u32) / w;
12849 let g = (x + y) & 0x0F;
12850 0xFF00_0000 | (g << 16) | (g << 8) | g
12851 })
12852 .collect();
12853 let stripes: Vec<u32> = (0..(w * h) as usize)
12854 .map(|i| {
12855 let x = (i as u32) % w;
12856 match x % 4 {
12857 0 => 0xFFAA_5500,
12858 1 => 0xFF55_AA00,
12859 2 => 0xFF00_55AA,
12860 _ => 0xFF55_00AA,
12861 }
12862 })
12863 .collect();
12864
12865 for (name, pixels) in [("gradient", &gradient), ("stripes", &stripes)] {
12866 let (r159_img, _, _) = build_predictor_image(pixels, w, h, size_bits);
12867 let (r160_img, _, _) = build_predictor_image_with_slack(pixels, w, h, size_bits, 0);
12868 assert_eq!(
12869 r159_img, r160_img,
12870 "slack=0 sub-image must equal r159 baseline on {name} {w}x{h} \
12871 size_bits={size_bits}"
12872 );
12873 let r159_bytes = encode_with_predictor(pixels, w, h, size_bits, None, w);
12874 let r160_bytes = encode_with_predictor_slack(pixels, w, h, size_bits, None, w, 0);
12875 assert_eq!(
12876 r159_bytes, r160_bytes,
12877 "slack=0 encoded bytes must equal r159 baseline on {name} {w}x{h} \
12878 size_bits={size_bits}"
12879 );
12880 }
12881 }
12882 }
12883
12884 /// Round 160 round-trip correctness: at any slack budget, the
12885 /// slack-cost predictor candidate produces a stream that, when
12886 /// framed and decoded, reproduces the input pixels exactly. The
12887 /// per-block chosen mode changes with slack but the forward
12888 /// transform always derives residuals from the chosen modes and
12889 /// the decoder re-derives the same modes from the sub-image.
12890 #[test]
12891 fn round_160_slack_predictor_round_trips_through_decoder() {
12892 let w = 32u32;
12893 let h = 32u32;
12894 let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12895 let pixels: Vec<u32> = (0..(w * h) as usize)
12896 .map(|i| {
12897 let x = (i as u32) % w;
12898 let y = (i as u32) / w;
12899 let r = (x * 7) & 0xff;
12900 let g = (y * 11) & 0xff;
12901 let b = ((x ^ y) * 3) & 0xff;
12902 0xFF00_0000 | (r << 16) | (g << 8) | b
12903 })
12904 .collect();
12905 let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
12906 for slack in [0, block_pixels, 2 * block_pixels, 8 * block_pixels] {
12907 let stream = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
12908 let header = build_image_header(w, h, true);
12909 let mut payload = header.to_vec();
12910 payload.extend_from_slice(&stream);
12911 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12912 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12913 assert_eq!(
12914 img.pixels(),
12915 pixels.as_slice(),
12916 "round-160 slack={slack} predictor candidate failed end-to-end round-trip"
12917 );
12918 }
12919 }
12920
12921 /// Round 160 non-regression: across a fixture matrix the
12922 /// production `encode_argb_with_predictor_chooser` output is
12923 /// `<=` the chooser's output with slack candidates disabled
12924 /// (i.e. the round-159 chooser). The new slack candidates can
12925 /// only *add* options to the byte-best selection, so they must
12926 /// never increase the chosen output length.
12927 #[test]
12928 fn round_160_chooser_never_regresses_vs_round_159() {
12929 let shapes: &[(u32, u32)] = &[(32, 32), (48, 48), (32, 64), (64, 32), (24, 24)];
12930 for &(w, h) in shapes {
12931 // Three fixtures: smooth gradient, palette stripes, and
12932 // a sparse noise image (low predictor residual mass for
12933 // a few mode-image blocks, high for others — exactly
12934 // the regime where the slack tie-break can pay off).
12935 let gradient: Vec<u32> = (0..(w * h) as usize)
12936 .map(|i| {
12937 let x = (i as u32) % w;
12938 let y = (i as u32) / w;
12939 let g = (x + y) & 0x0F;
12940 0xFF00_0000 | (g << 16) | (g << 8) | g
12941 })
12942 .collect();
12943 let stripes: Vec<u32> = (0..(w * h) as usize)
12944 .map(|i| {
12945 let x = (i as u32) % w;
12946 match x % 4 {
12947 0 => 0xFFAA_5500,
12948 1 => 0xFF55_AA00,
12949 2 => 0xFF00_55AA,
12950 _ => 0xFF55_00AA,
12951 }
12952 })
12953 .collect();
12954 let mut s: u32 = 0xCAFE_BABE;
12955 let noise: Vec<u32> = (0..(w * h) as usize)
12956 .map(|_| {
12957 s ^= s << 13;
12958 s ^= s >> 17;
12959 s ^= s << 5;
12960 0xFF00_0000 | (s & 0x00FF_FFFF)
12961 })
12962 .collect();
12963
12964 for (name, pixels) in [
12965 ("gradient", &gradient),
12966 ("stripes", &stripes),
12967 ("noise", &noise),
12968 ] {
12969 let r159 = encode_argb_with_predictor_chooser_no_r160_slack(pixels, w, h);
12970 let r160 = encode_argb_with_predictor_chooser(pixels, w, h);
12971 assert!(
12972 r160.len() <= r159.len(),
12973 "round-160 chooser regressed on {name} {w}x{h}: r159={} B r160={} B",
12974 r159.len(),
12975 r160.len()
12976 );
12977 // End-to-end round-trip parity on the r160 stream.
12978 let header = build_image_header(w, h, true);
12979 let mut payload = header.to_vec();
12980 payload.extend_from_slice(&r160);
12981 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12982 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12983 assert_eq!(
12984 img.pixels(),
12985 pixels.as_slice(),
12986 "round-160 chooser output failed end-to-end round-trip on \
12987 {name} {w}x{h}"
12988 );
12989 }
12990 }
12991 }
12992
12993 /// Round 160 headline: the slack-cost **predictor candidate**
12994 /// strictly beats the round-159 strict-tie-break predictor
12995 /// candidate on at least one fixture, with the seed, slack
12996 /// budget, and byte savings printed for the round report.
12997 ///
12998 /// The comparison is between the two predictor candidates in
12999 /// isolation, not between the overall chooser outputs: the
13000 /// production chooser composes the predictor candidate with
13001 /// every other transform path (no-tx, subtract-green, color-
13002 /// transform, color-indexing, multi-meta-prefix) and may pick a
13003 /// non-predictor path as best, so the chooser output won't
13004 /// always reflect the slack savings on the predictor candidate
13005 /// alone. The invariant we *prove* here is: on at least one
13006 /// fixture in the sweep, `encode_with_predictor_slack(..,
13007 /// slack > 0, ..)` produces a strictly shorter byte stream
13008 /// than `encode_with_predictor(.., slack = 0, ..)`, which is
13009 /// the byte-cost win the round-160 slack-cost variant is
13010 /// designed to capture. The full chooser also picks up the
13011 /// win whenever the predictor path ends up the byte-best
13012 /// overall.
13013 ///
13014 /// The fixtures are seeded perturbations of a mostly-uniform
13015 /// canvas: small perturbation patches plus a sparse single-
13016 /// pixel noise sprinkle. These are the layouts where the
13017 /// predictor sub-image carries a small number of "almost
13018 /// uniform" mode-image entries that the slack tie-break can
13019 /// collapse onto a single dominant mode at a small residual
13020 /// cost.
13021 #[test]
13022 fn round_160_slack_candidate_strictly_beats_strict_on_some_fixture() {
13023 let w = 128u32;
13024 let h = 128u32;
13025 let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13026 let mut found = false;
13027 let mut best_savings: i64 = 0;
13028 let mut seed_winner: u32 = 0;
13029 let mut slack_winner: u64 = 0;
13030 // Slack sweep: pick a spread of budgets between 1 residual
13031 // unit and 4× block_pixels. The diagnostic phase of round
13032 // 160 development showed that the productive regime starts
13033 // around slack ≥ block_pixels / 4 (16-pixel blocks → slack
13034 // ≥ 64) on the seeded fixtures used here.
13035 let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
13036 let slack_candidates: &[u64] = &[
13037 1,
13038 4,
13039 16,
13040 64,
13041 block_pixels,
13042 2 * block_pixels,
13043 4 * block_pixels,
13044 ];
13045 for seed_init in [
13046 0xCAFE_BABEu32,
13047 0xC0FFEE00,
13048 0xDEAD_BEEF,
13049 0xFACE_F00D,
13050 0xFEED_F00D,
13051 0x1234_5678,
13052 0xABCD_1234,
13053 0x90AB_CDEF,
13054 0x5A5A_5A5A,
13055 0xA5A5_A5A5,
13056 0xBA5E_BA11,
13057 0xB16B_00B5,
13058 0x00DD_BA11,
13059 0xC1AB_AB00,
13060 0xDEAF_BABE,
13061 0xCABB_A6E0,
13062 0x1337_C0DE,
13063 0xABAD_CAFE,
13064 0xBADF_00D0,
13065 0x8BAD_F00D,
13066 ] {
13067 // Mostly-solid canvas with a 1-bit-per-channel noise
13068 // overlay sprinkled at a sparse stride. The overlay is
13069 // small enough that the residual mass added per block
13070 // is in the order of `block_pixels` (matches our chooser
13071 // slack budget) but large enough to push the best-mode
13072 // choice off the all-zero tie in some blocks.
13073 let solid = 0xff60_8050u32;
13074 let mut pixels = vec![solid; (w * h) as usize];
13075 let mut s = seed_init;
13076 // Two perturbation patches of varying sizes to give the
13077 // chooser something to chew on without dominating the
13078 // whole image (the chooser must still see lots of tied
13079 // blocks for the slack tie-break to pay off).
13080 for y in 0..6u32 {
13081 for x in 0..6u32 {
13082 s ^= s << 13;
13083 s ^= s >> 17;
13084 s ^= s << 5;
13085 pixels[(y * w + x) as usize] = (s & 0x0003_0303) | 0xFF60_8050;
13086 }
13087 }
13088 for y in 20..30u32 {
13089 for x in 20..30u32 {
13090 s ^= s << 13;
13091 s ^= s >> 17;
13092 s ^= s << 5;
13093 pixels[(y * w + x) as usize] = (s & 0x0007_0707) | 0xFF60_8050;
13094 }
13095 }
13096 // Sparse single-pixel perturbations scattered across the
13097 // remaining canvas — these are the perturbations that
13098 // tend to push individual blocks just barely off the
13099 // best-mode tie, exposing the slack tie-break opportunity.
13100 for _ in 0..32u32 {
13101 s ^= s << 13;
13102 s ^= s >> 17;
13103 s ^= s << 5;
13104 let px = (s >> 8) % w;
13105 let py = (s >> 16) % h;
13106 pixels[(py * w + px) as usize] = (s & 0x0001_0101) | 0xFF60_8050;
13107 }
13108
13109 // Strict-tie-break baseline (round-159 chooser): the
13110 // slack = 0 predictor candidate at the default
13111 // size_bits. Cache-bits stays at None for a clean
13112 // comparison — the slack candidate is also tested at
13113 // cache_code_bits = None, isolating the effect to the
13114 // §4.1 forward transform.
13115 let strict_bytes = encode_with_predictor(&pixels, w, h, size_bits, None, w);
13116 // Slack sweep: pick the smallest slack-cost predictor
13117 // stream and compare against the strict baseline.
13118 let mut best_slack_bytes = strict_bytes.clone();
13119 let mut best_slack_value: u64 = 0;
13120 for &slack in slack_candidates {
13121 let bytes = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
13122 if bytes.len() < best_slack_bytes.len() {
13123 best_slack_bytes = bytes;
13124 best_slack_value = slack;
13125 }
13126 }
13127 if best_slack_bytes.len() < strict_bytes.len() {
13128 let saved = strict_bytes.len() as i64 - best_slack_bytes.len() as i64;
13129 if saved > best_savings {
13130 best_savings = saved;
13131 seed_winner = seed_init;
13132 slack_winner = best_slack_value;
13133 }
13134 if !found {
13135 found = true;
13136 }
13137 // Round-trip the winning slack stream end-to-end
13138 // through the full framed-WebP path to prove decode
13139 // correctness on the slack-tie-break-modified
13140 // residual stream.
13141 let header = build_image_header(w, h, true);
13142 let mut payload = header.to_vec();
13143 payload.extend_from_slice(&best_slack_bytes);
13144 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13145 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13146 assert_eq!(
13147 img.pixels(),
13148 pixels.as_slice(),
13149 "round-160 strict-beat predictor candidate round-trip mismatch on \
13150 seed=0x{seed_init:08x} slack={best_slack_value}"
13151 );
13152 eprintln!(
13153 "[round-160] slack-cost strict-beat: seed=0x{seed_init:08x}, \
13154 slack={best_slack_value}, strict={} B slack={} B saved={saved} B",
13155 strict_bytes.len(),
13156 best_slack_bytes.len(),
13157 );
13158 }
13159 // Production chooser non-regression: r160 chooser
13160 // (which evaluates both strict and slack predictor
13161 // candidates against every other transform path) is
13162 // always ≤ r159 chooser (which evaluates strict only).
13163 let r159 = encode_argb_with_predictor_chooser_no_r160_slack(&pixels, w, h);
13164 let r160 = encode_argb_with_predictor_chooser(&pixels, w, h);
13165 assert!(
13166 r160.len() <= r159.len(),
13167 "round-160 chooser regressed on seed 0x{seed_init:08x}: \
13168 r159={} B r160={} B",
13169 r159.len(),
13170 r160.len()
13171 );
13172 }
13173 assert!(
13174 found,
13175 "round-160 slack-cost sweep did not produce a single strict byte reduction \
13176 across the seeded fixture set; the new slack candidates never won \
13177 (best_savings={best_savings} on seed=0x{seed_winner:08x} slack={slack_winner})"
13178 );
13179 }
13180
13181 /// Local pre-round-160 copy of `encode_argb_with_predictor_chooser`
13182 /// that omits the round-160 slack-cost predictor candidates. Used
13183 /// by the round-160 non-regression and strict-beat tests as the
13184 /// before-after baseline; the rest of the chooser (no-tx,
13185 /// subtract-green, color-transform, color-indexing, meta-prefix)
13186 /// is re-used verbatim.
13187 fn encode_argb_with_predictor_chooser_no_r160_slack(
13188 pixels: &[u32],
13189 width: u32,
13190 height: u32,
13191 ) -> Vec<u8> {
13192 let mut best = encode_argb_literals_with_width(pixels, width);
13193
13194 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13195 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13196 let pred_block = 1u32 << pred_size_bits;
13197 let ctx_block = 1u32 << ctx_size_bits;
13198
13199 if width >= pred_block && height >= pred_block {
13200 let mut pred_single_block_size_bits: u8 = pred_size_bits;
13201 while pred_single_block_size_bits < 9
13202 && ((1u32 << pred_single_block_size_bits) < width
13203 || (1u32 << pred_single_block_size_bits) < height)
13204 {
13205 pred_single_block_size_bits += 1;
13206 }
13207 let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13208 let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13209 encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13210 })];
13211 if try_pred_single_block {
13212 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13213 encode_with_predictor(
13214 pixels,
13215 width,
13216 height,
13217 pred_single_block_size_bits,
13218 cache_bits,
13219 width,
13220 )
13221 }));
13222 }
13223 for cand in pred_candidates {
13224 if cand.len() < best.len() {
13225 best = cand;
13226 }
13227 }
13228 }
13229
13230 if width >= ctx_block && height >= ctx_block {
13231 let mut single_block_size_bits: u8 = ctx_size_bits;
13232 while single_block_size_bits < 9
13233 && ((1u32 << single_block_size_bits) < width
13234 || (1u32 << single_block_size_bits) < height)
13235 {
13236 single_block_size_bits += 1;
13237 }
13238 let try_single_block = single_block_size_bits != ctx_size_bits;
13239 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13240 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
13241 })];
13242 if try_single_block {
13243 candidates.push(select_best_cache_bits(|cache_bits| {
13244 encode_with_color_transform(
13245 pixels,
13246 width,
13247 height,
13248 single_block_size_bits,
13249 cache_bits,
13250 width,
13251 )
13252 }));
13253 }
13254 for cand in candidates {
13255 if cand.len() < best.len() {
13256 best = cand;
13257 }
13258 }
13259 }
13260
13261 if collect_palette(pixels).is_some() {
13262 let ci_best = select_best_cache_bits(|cache_bits| {
13263 encode_with_color_indexing(pixels, width, height, cache_bits)
13264 .expect("palette feasibility already confirmed")
13265 });
13266 if ci_best.len() < best.len() {
13267 best = ci_best;
13268 }
13269 }
13270
13271 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
13272 if mp_best.len() < best.len() {
13273 best = mp_best;
13274 }
13275 }
13276
13277 best
13278 }
13279
13280 // ---- Round 161 tests: Shannon-entropy bit-cost predictor variant -------
13281
13282 /// Local pre-round-161 copy of `encode_argb_with_predictor_chooser`
13283 /// that omits the round-161 entropy-cost predictor candidates but
13284 /// **keeps** every round-160 slack-cost candidate. Used by the
13285 /// round-161 non-regression and strict-beat tests as the
13286 /// before-after baseline. Mirrors
13287 /// `encode_argb_with_predictor_chooser_no_r160_slack` in shape.
13288 fn encode_argb_with_predictor_chooser_no_r161_entropy(
13289 pixels: &[u32],
13290 width: u32,
13291 height: u32,
13292 ) -> Vec<u8> {
13293 let mut best = encode_argb_literals_with_width(pixels, width);
13294
13295 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13296 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13297 let pred_block = 1u32 << pred_size_bits;
13298 let ctx_block = 1u32 << ctx_size_bits;
13299
13300 if width >= pred_block && height >= pred_block {
13301 let mut pred_single_block_size_bits: u8 = pred_size_bits;
13302 while pred_single_block_size_bits < 9
13303 && ((1u32 << pred_single_block_size_bits) < width
13304 || (1u32 << pred_single_block_size_bits) < height)
13305 {
13306 pred_single_block_size_bits += 1;
13307 }
13308 let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13309 let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13310 encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13311 })];
13312 let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
13313 for slack in [
13314 pred_block_pixels,
13315 2 * pred_block_pixels,
13316 4 * pred_block_pixels,
13317 ] {
13318 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13319 encode_with_predictor_slack(
13320 pixels,
13321 width,
13322 height,
13323 pred_size_bits,
13324 cache_bits,
13325 width,
13326 slack,
13327 )
13328 }));
13329 }
13330 if try_pred_single_block {
13331 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13332 encode_with_predictor(
13333 pixels,
13334 width,
13335 height,
13336 pred_single_block_size_bits,
13337 cache_bits,
13338 width,
13339 )
13340 }));
13341 let single_pred_block_pixels: u64 =
13342 (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
13343 for slack in [
13344 single_pred_block_pixels,
13345 2 * single_pred_block_pixels,
13346 4 * single_pred_block_pixels,
13347 ] {
13348 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13349 encode_with_predictor_slack(
13350 pixels,
13351 width,
13352 height,
13353 pred_single_block_size_bits,
13354 cache_bits,
13355 width,
13356 slack,
13357 )
13358 }));
13359 }
13360 }
13361 for cand in pred_candidates {
13362 if cand.len() < best.len() {
13363 best = cand;
13364 }
13365 }
13366 }
13367
13368 if width >= ctx_block && height >= ctx_block {
13369 let mut single_block_size_bits: u8 = ctx_size_bits;
13370 while single_block_size_bits < 9
13371 && ((1u32 << single_block_size_bits) < width
13372 || (1u32 << single_block_size_bits) < height)
13373 {
13374 single_block_size_bits += 1;
13375 }
13376 let try_single_block = single_block_size_bits != ctx_size_bits;
13377 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13378 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
13379 })];
13380 if try_single_block {
13381 candidates.push(select_best_cache_bits(|cache_bits| {
13382 encode_with_color_transform(
13383 pixels,
13384 width,
13385 height,
13386 single_block_size_bits,
13387 cache_bits,
13388 width,
13389 )
13390 }));
13391 }
13392 for cand in candidates {
13393 if cand.len() < best.len() {
13394 best = cand;
13395 }
13396 }
13397 }
13398
13399 if collect_palette(pixels).is_some() {
13400 let ci_best = select_best_cache_bits(|cache_bits| {
13401 encode_with_color_indexing(pixels, width, height, cache_bits)
13402 .expect("palette feasibility already confirmed")
13403 });
13404 if ci_best.len() < best.len() {
13405 best = ci_best;
13406 }
13407 }
13408
13409 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
13410 if mp_best.len() < best.len() {
13411 best = mp_best;
13412 }
13413 }
13414
13415 best
13416 }
13417
13418 /// Round 161 — [`block_mode_entropy_cost`] reports zero milli-bits
13419 /// on a 1×1 block of pixel `0xff_00_00_00` (the top-left border
13420 /// rule sets `pred = 0xff_00_00_00`, so the residual is zero, the
13421 /// histogram has a single occupied bin per channel and the
13422 /// `c · log2(N/c) = N · log2(1) = 0` per-bin contribution sums to
13423 /// zero). Confirms the entropy summation correctly bottoms-out
13424 /// at the no-residual edge case.
13425 #[test]
13426 fn round_161_block_mode_entropy_cost_zero_on_zero_residual_block() {
13427 let pixels = vec![0xff_00_00_00u32; 1];
13428 for mode in 0u8..=13 {
13429 let cost = block_mode_entropy_cost(&pixels, 1, 1, 0, 0, 1, 1, mode);
13430 assert_eq!(
13431 cost, 0,
13432 "1×1 zero-residual block should produce zero-entropy cost under mode {mode}, got {cost}"
13433 );
13434 }
13435 }
13436
13437 /// Round 161 — on an interior solid-fill block, every mode that
13438 /// produces a *constant* residual (whether zero or non-zero) ties
13439 /// at zero Shannon entropy — Shannon entropy measures **variety**
13440 /// in the residual symbol distribution, not magnitude. This is
13441 /// the key structural difference from the L1 magnitude proxy: L1
13442 /// would penalise mode 0 (which emits constant non-zero residual
13443 /// `0x00_60_80_50` per pixel on a `0xff_60_80_50` solid block),
13444 /// while Shannon entropy correctly treats a constant-residual
13445 /// distribution as zero-cost (a Huffman code over a single-symbol
13446 /// alphabet emits one bit per symbol, which is the theoretical
13447 /// floor and matches the §3.7.2.1.1 single-leaf encoding's
13448 /// near-zero overhead).
13449 ///
13450 /// This test pins down that semantic: on the interior solid
13451 /// block, every neighbour-predicting mode AND mode 0 all sit at
13452 /// zero entropy cost; the chooser then falls through to the
13453 /// lowest-index tie-break (mode 0) or the hint when one is
13454 /// supplied.
13455 #[test]
13456 fn round_161_block_mode_entropy_cost_zero_on_constant_residual_block() {
13457 let w = 8usize;
13458 let h = 8usize;
13459 let pixels = vec![0xff_60_80_50u32; w * h];
13460 // Block [4..8) × [4..8) — interior. Every mode produces a
13461 // constant residual across the block (zero for the
13462 // neighbour-predicting modes; `0x00_60_80_50` for mode 0).
13463 // Constant residual = single-symbol histogram per channel
13464 // = zero Shannon entropy.
13465 for mode in 0u8..=13 {
13466 let cost = block_mode_entropy_cost(&pixels, w, h, 4, 4, 4, 4, mode);
13467 assert_eq!(
13468 cost, 0,
13469 "constant-residual mode {mode} on interior solid block should have zero entropy cost, got {cost}"
13470 );
13471 }
13472 }
13473
13474 /// Round 161 — Shannon entropy cost is strictly monotone in
13475 /// residual variety: a block whose residual histogram is
13476 /// peaked at a single value (zero or non-zero) has lower
13477 /// entropy cost than a block whose residuals scatter across
13478 /// multiple distinct values. This is the property a Huffman
13479 /// code over the residuals would actually minimise — and the
13480 /// L1 magnitude proxy does NOT distinguish (a constant non-
13481 /// zero residual block has the same L1 sum as a scattered
13482 /// block of the same mean magnitude). Confirms the entropy
13483 /// cost adds real signal vs the proxy.
13484 #[test]
13485 fn round_161_entropy_cost_distinguishes_concentrated_from_scattered() {
13486 // 16×16 image with two interior blocks. Concentrated block:
13487 // pure solid grey on the [4..8) × [4..8) corner — mode 1 (L
13488 // predictor) reproduces every interior pixel from its left
13489 // neighbour so every residual is zero. Scattered block:
13490 // checkerboard greys on the [8..12) × [8..12) corner — mode
13491 // 1 produces non-zero residuals alternating across
13492 // horizontal steps, populating multiple histogram bins.
13493 let w = 16usize;
13494 let h = 16usize;
13495 let grey = 0xff_60_80_50u32;
13496 let other = 0xff_70_90_60u32;
13497 let mut pixels = vec![grey; w * h];
13498 // Scatter `other` in a horizontal checkerboard across the
13499 // scattered block region. Use an isolated mutated quadrant
13500 // that doesn't reach the concentrated block; keep a buffer
13501 // row/column of solid grey around the scattered block so
13502 // its L neighbours at the block's left edge are still grey
13503 // (giving a deterministic histogram).
13504 for y in 8..12 {
13505 for x in 8..12 {
13506 if x % 2 == 0 {
13507 pixels[y * w + x] = other;
13508 }
13509 }
13510 }
13511 let concentrated = block_mode_entropy_cost(&pixels, w, h, 4, 4, 4, 4, 1);
13512 let scattered = block_mode_entropy_cost(&pixels, w, h, 8, 8, 4, 4, 1);
13513 assert!(
13514 scattered > concentrated,
13515 "scattered block should have higher entropy cost than concentrated: \
13516 scattered={scattered}, concentrated={concentrated}"
13517 );
13518 assert_eq!(
13519 concentrated, 0,
13520 "concentrated (interior solid) block under mode 1 should have zero-entropy cost, \
13521 got {concentrated}"
13522 );
13523 assert!(
13524 scattered > 0,
13525 "scattered block should have strictly positive entropy cost, got {scattered}"
13526 );
13527 }
13528
13529 /// Round 161 — the entropy chooser's tie-break mechanism mirrors
13530 /// the round-159 strict tie-break: when `prefer_mode`'s entropy
13531 /// cost equals the best, the chooser returns the preferred mode.
13532 /// On an interior solid-fill block, *every* mode produces a
13533 /// constant residual (zero or a fixed colour) and so ties at
13534 /// zero Shannon entropy; the chooser falls back to the lowest-
13535 /// index tie (mode 0) and the hint flips to any preferred mode.
13536 #[test]
13537 fn round_161_pick_block_mode_with_hint_entropy_honours_tie() {
13538 let w = 8usize;
13539 let h = 8usize;
13540 let pixels = vec![0xff_60_80_50u32; w * h];
13541 // Interior [4..8) × [4..8) block — every mode is a constant
13542 // residual (Shannon entropy zero) for the reasons in
13543 // [`round_161_block_mode_entropy_cost_zero_on_constant_residual_block`].
13544 // No hint → lowest mode 0 wins.
13545 let no_hint = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, None);
13546 assert_eq!(no_hint, 0);
13547 // Hint mode 11 → ties at zero → tie-break flips to 11.
13548 let with_hint = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, Some(11));
13549 assert_eq!(with_hint, 11);
13550 // Hint mode 5 → ties at zero → tie-break flips to 5.
13551 let with_hint5 = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, Some(5));
13552 assert_eq!(with_hint5, 5);
13553 }
13554
13555 /// Round 161 — `encode_with_predictor_entropy` round-trips
13556 /// end-to-end through `decode_lossless_image`. Confirms the
13557 /// entropy chooser produces a decodable stream regardless of
13558 /// what cost model picked the modes (the §4.1 forward transform
13559 /// recomputes residuals against whatever mode the sub-image
13560 /// records, and the decoder applies the same inverse against
13561 /// that mode).
13562 #[test]
13563 fn round_161_entropy_predictor_round_trips_through_decoder() {
13564 let w = 32u32;
13565 let h = 32u32;
13566 // Mostly-uniform canvas with two small perturbations + a
13567 // single-pixel sprinkle — same recipe family as the round-
13568 // 160 strict-beat fixture, but smaller for fast test runs.
13569 let mut pixels = vec![0xff_60_80_50u32; (w * h) as usize];
13570 let mut s: u32 = 0xCAFE_BABE;
13571 for y in 2..8u32 {
13572 for x in 4..10u32 {
13573 s ^= s << 13;
13574 s ^= s >> 17;
13575 s ^= s << 5;
13576 pixels[(y * w + x) as usize] = (s & 0x0007_0707) | 0xff60_8050;
13577 }
13578 }
13579 for cache_bits in [None, Some(2u32), Some(8u32)] {
13580 let bytes = encode_with_predictor_entropy(
13581 &pixels,
13582 w,
13583 h,
13584 DEFAULT_PREDICTOR_SIZE_BITS,
13585 cache_bits,
13586 w,
13587 );
13588 let header = build_image_header(w, h, true);
13589 let mut payload = header.to_vec();
13590 payload.extend_from_slice(&bytes);
13591 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13592 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13593 assert_eq!(
13594 img.pixels(),
13595 pixels.as_slice(),
13596 "entropy predictor round-trip mismatch at cache_bits={cache_bits:?}"
13597 );
13598 }
13599 }
13600
13601 /// Round 161 — production chooser must never regress relative to
13602 /// the round-160 baseline. The round-161 entropy candidate is an
13603 /// additional path; the chooser keeps the byte-shortest stream,
13604 /// so adding a candidate cannot lengthen the output.
13605 #[test]
13606 fn round_161_chooser_never_regresses_vs_round_160() {
13607 let shapes: &[(u32, u32)] = &[(16, 16), (32, 32), (48, 48), (64, 32), (32, 64)];
13608 for &(w, h) in shapes {
13609 // Fixture A: solid fill.
13610 let solid = vec![0xff_60_80_50u32; (w * h) as usize];
13611 // Fixture B: low-frequency gradient.
13612 let mut gradient = vec![0u32; (w * h) as usize];
13613 for y in 0..h {
13614 for x in 0..w {
13615 let r = (x * 255 / w.max(1)) as u8;
13616 let g = (y * 255 / h.max(1)) as u8;
13617 gradient[(y * w + x) as usize] =
13618 0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | 0x40;
13619 }
13620 }
13621 // Fixture C: small noise patch on a solid background.
13622 let mut sparse = vec![0xff_70_70_70u32; (w * h) as usize];
13623 let mut s: u32 = 0xDEAD_BEEF ^ (w * h);
13624 for _ in 0..(w * h / 16) {
13625 s ^= s << 13;
13626 s ^= s >> 17;
13627 s ^= s << 5;
13628 let idx = ((s as usize) % sparse.len()) as usize;
13629 sparse[idx] = (s & 0x0003_0303) | 0xff70_7070;
13630 }
13631 for (name, pixels) in &[
13632 ("solid", &solid),
13633 ("gradient", &gradient),
13634 ("sparse", &sparse),
13635 ] {
13636 let r160 = encode_argb_with_predictor_chooser_no_r161_entropy(pixels, w, h);
13637 let r161 = encode_argb_with_predictor_chooser(pixels, w, h);
13638 assert!(
13639 r161.len() <= r160.len(),
13640 "round-161 chooser regressed on {name} {w}x{h}: \
13641 r160={} B r161={} B",
13642 r160.len(),
13643 r161.len()
13644 );
13645 // Confirm decode round-trip on whatever the chooser
13646 // emitted — the chooser may have chosen the entropy
13647 // path or any of the L1 paths.
13648 let header = build_image_header(w, h, true);
13649 let mut payload = header.to_vec();
13650 payload.extend_from_slice(&r161);
13651 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13652 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13653 assert_eq!(
13654 img.pixels(),
13655 pixels.as_slice(),
13656 "round-161 chooser output failed decode round-trip on {name} {w}x{h}"
13657 );
13658 }
13659 }
13660 }
13661
13662 /// Round 161 — sweep seeded fixtures to find at least one input
13663 /// where the entropy-cost predictor candidate strictly beats the
13664 /// best L1-proxy predictor candidate on raw bytes. Proves the
13665 /// entropy cost is doing real work — it's not merely a
13666 /// no-op-aliased duplicate of the round-160 path. The sweep
13667 /// also stress-tests round-trip correctness on every fixture
13668 /// where the entropy path wins.
13669 ///
13670 /// Construction: pre-residualised image families where the per-
13671 /// block mode-cost ordering differs between L1 magnitude and
13672 /// Shannon entropy. The most reliable family is one whose
13673 /// "lowest L1 mode" produces a varied residual histogram while
13674 /// some "slightly-higher L1 mode" produces a concentrated
13675 /// residual histogram — Shannon entropy picks the concentrated
13676 /// mode (faithful to what Huffman codes minimise), L1 picks the
13677 /// magnitude-min mode.
13678 #[test]
13679 fn round_161_entropy_candidate_strictly_beats_l1_on_some_fixture() {
13680 let w = 64u32;
13681 let h = 64u32;
13682 let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13683 let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
13684 let mut found = false;
13685 let mut best_savings: i64 = 0;
13686 let mut seed_winner: u32 = 0;
13687 let mut family_winner: &'static str = "";
13688 // Family A: row-translated tile with a hand-chosen base
13689 // colour. The L predictor (mode 1) reproduces each row's
13690 // base colour and has zero residual on interior pixels —
13691 // but the top-row predict-L rule on the first row leaks a
13692 // varied histogram (each first-row pixel's residual is a
13693 // function of its preceding column's source colour). Mode
13694 // 0 (predict 0xff000000) emits a constant residual equal
13695 // to source per pixel — zero entropy when the image is
13696 // solid, non-zero entropy when scattered. On a scattered
13697 // image mode 1 is L1-best but mode 0 is entropy-best.
13698 for seed_init in [
13699 0xCAFE_BABEu32,
13700 0xC0FFEE00,
13701 0xDEAD_BEEF,
13702 0xFACE_F00D,
13703 0xFEED_F00D,
13704 0x1234_5678,
13705 0xABCD_1234,
13706 0x90AB_CDEF,
13707 0x5A5A_5A5A,
13708 0xA5A5_A5A5,
13709 0xBA5E_BA11,
13710 0xB16B_00B5,
13711 0x00DD_BA11,
13712 0xC1AB_AB00,
13713 0xDEAF_BABE,
13714 0xCABB_A6E0,
13715 0x1337_C0DE,
13716 0xABAD_CAFE,
13717 0xBADF_00D0,
13718 0x8BAD_F00D,
13719 0xFEE1_DEAD,
13720 0xDEFE_C8ED,
13721 0xD15E_A5E0,
13722 0x600D_F00D,
13723 0xDEAD_C0DE,
13724 0xBADC_0DED,
13725 0xCAFE_F00D,
13726 0xC0DE_F00D,
13727 0xDEED_BEEF,
13728 0xBEAD_F00D,
13729 0x8008_5318,
13730 0xD0DE_C0DE,
13731 ] {
13732 // Build a fixture whose per-block mode-cost ordering
13733 // disagrees between L1 and Shannon entropy. The family
13734 // below produces blocks of varying L1-vs-entropy
13735 // disagreement intensity:
13736 //
13737 // Quadrant A (top-left): smooth low-frequency pattern
13738 // where neighbour-predicting modes have low L1 but
13739 // spread their residuals across multiple histogram
13740 // bins (residual varies slightly with position).
13741 // Quadrant B (bottom-right): rare "spike" pixels (1 or
13742 // 2 per block) where mode 0's constant residual
13743 // distribution wins on entropy.
13744 //
13745 // The two quadrants live in separate predictor blocks
13746 // so each contributes independently to whichever mode
13747 // wins on a block-by-block basis.
13748 let mut pixels = vec![0xff_60_80_50u32; (w * h) as usize];
13749 let mut s = seed_init;
13750 // Quadrant A: 32x32 patterned image with column-driven
13751 // gradient and a per-row jitter — produces non-trivial
13752 // residual histograms for every mode, so the L1-vs-
13753 // entropy disagreement frequency goes up.
13754 for y in 0..(h / 2) {
13755 for x in 0..(w / 2) {
13756 s ^= s << 13;
13757 s ^= s >> 17;
13758 s ^= s << 5;
13759 // Column-correlated colour + per-row jitter.
13760 let r = 0x40 + (x as u8 & 0x1f);
13761 let g = 0x60 + ((y as u8) & 0x1f) + ((s & 1) as u8);
13762 let b = 0x30 + ((x as u8 ^ y as u8) & 0x0f);
13763 pixels[(y * w + x) as usize] =
13764 0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | (b as u32);
13765 }
13766 }
13767 // Quadrant B: solid grey with deliberate single-pixel
13768 // spikes at predictable positions. The spikes are
13769 // chosen to land inside a few of the predictor blocks
13770 // so those blocks see a residual distribution with one
13771 // major bin (zero) and one minor bin (the spike). The
13772 // L1 chooser picks the mode that minimises spike
13773 // magnitude; the entropy chooser picks the mode that
13774 // minimises the count of distinct residual bins.
13775 for y in (h / 2)..h {
13776 for x in (w / 2)..w {
13777 s ^= s << 13;
13778 s ^= s >> 17;
13779 s ^= s << 5;
13780 if (s & 0x1f) == 0 {
13781 // Spike: random near-grey perturbation.
13782 let perturb = (s & 0x0f0f_0f0f) | 0xff60_8050;
13783 pixels[(y * w + x) as usize] = perturb;
13784 }
13785 }
13786 }
13787 // Best L1-proxy predictor candidate at default
13788 // size_bits: strict round-159 + round-160 slack sweep.
13789 let strict_bytes = encode_with_predictor(&pixels, w, h, size_bits, None, w);
13790 let mut best_l1_bytes = strict_bytes.clone();
13791 for slack in [block_pixels, 2 * block_pixels, 4 * block_pixels] {
13792 let bytes = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
13793 if bytes.len() < best_l1_bytes.len() {
13794 best_l1_bytes = bytes;
13795 }
13796 }
13797 let entropy_bytes = encode_with_predictor_entropy(&pixels, w, h, size_bits, None, w);
13798 if entropy_bytes.len() < best_l1_bytes.len() {
13799 let saved = best_l1_bytes.len() as i64 - entropy_bytes.len() as i64;
13800 if saved > best_savings {
13801 best_savings = saved;
13802 seed_winner = seed_init;
13803 family_winner = "two-quadrant";
13804 }
13805 if !found {
13806 found = true;
13807 }
13808 // Round-trip the winning entropy stream end-to-end.
13809 let header = build_image_header(w, h, true);
13810 let mut payload = header.to_vec();
13811 payload.extend_from_slice(&entropy_bytes);
13812 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13813 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13814 assert_eq!(
13815 img.pixels(),
13816 pixels.as_slice(),
13817 "round-161 entropy strict-beat predictor candidate round-trip mismatch on \
13818 seed=0x{seed_init:08x}"
13819 );
13820 eprintln!(
13821 "[round-161] entropy strict-beat: seed=0x{seed_init:08x}, \
13822 best_l1={} B entropy={} B saved={saved} B",
13823 best_l1_bytes.len(),
13824 entropy_bytes.len(),
13825 );
13826 }
13827 }
13828 // Family B: hand-crafted "constant non-zero residual"
13829 // fixture — a solid-colour image where mode 0 emits a
13830 // constant residual `source - 0xff000000` per pixel. The
13831 // L1 cost of mode 0 is `Σ |source - black|` per pixel; the
13832 // entropy cost of mode 0 is zero (single-symbol histogram).
13833 // Mode 1 (L predictor) also emits zero residual for
13834 // interior pixels but has non-zero residual at the leftmost
13835 // column. On a small image the per-block winner depends on
13836 // which of these effects dominates.
13837 if !found {
13838 // Build a 16×16 solid image — exactly one predictor
13839 // block at size_bits=4. The L1 cost of mode 0 is huge
13840 // (16² × magnitude); mode 1's cost is small (only the
13841 // leftmost column contributes). L1 picks mode 1.
13842 // Shannon entropy: mode 0 = 0 (constant residual);
13843 // mode 1 = small but non-zero (the leftmost column
13844 // residual). Entropy picks mode 0.
13845 //
13846 // Whether mode 0's predictor stream beats mode 1's
13847 // depends on the §5.x prefix-code overhead vs the
13848 // saved residual mass — not guaranteed, but a
13849 // candidate worth trying.
13850 let w2 = 16u32;
13851 let h2 = 16u32;
13852 let pixels2 = vec![0xff_80_80_80u32; (w2 * h2) as usize];
13853 let l1_bytes = encode_with_predictor(&pixels2, w2, h2, size_bits, None, w2);
13854 let entropy_bytes =
13855 encode_with_predictor_entropy(&pixels2, w2, h2, size_bits, None, w2);
13856 if entropy_bytes.len() < l1_bytes.len() {
13857 let saved = l1_bytes.len() as i64 - entropy_bytes.len() as i64;
13858 best_savings = saved;
13859 family_winner = "solid-grey-16x16";
13860 found = true;
13861 let header = build_image_header(w2, h2, true);
13862 let mut payload = header.to_vec();
13863 payload.extend_from_slice(&entropy_bytes);
13864 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w2, h2).unwrap();
13865 let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13866 assert_eq!(
13867 img.pixels(),
13868 pixels2.as_slice(),
13869 "round-161 entropy strict-beat solid-grey round-trip mismatch"
13870 );
13871 eprintln!(
13872 "[round-161] entropy strict-beat (solid-grey 16x16): \
13873 l1={} B entropy={} B saved={saved} B",
13874 l1_bytes.len(),
13875 entropy_bytes.len(),
13876 );
13877 }
13878 }
13879 assert!(
13880 found,
13881 "round-161 entropy candidate did not produce a single strict byte reduction \
13882 across the seeded fixture set; the entropy cost never won \
13883 (best_savings={best_savings} on seed=0x{seed_winner:08x} family={family_winner})"
13884 );
13885 }
13886
13887 // ---- Round 162 tests: sub-image-aware Shannon-entropy chooser ----------
13888
13889 /// Local pre-round-162 copy of `encode_argb_with_predictor_chooser`
13890 /// that omits the round-162 sub-image-aware lambda sweep but
13891 /// keeps every round-161 entropy candidate. Used as the
13892 /// before-after baseline for the round-162 non-regression and
13893 /// strict-beat tests.
13894 fn encode_argb_with_predictor_chooser_no_r162_subaware(
13895 pixels: &[u32],
13896 width: u32,
13897 height: u32,
13898 ) -> Vec<u8> {
13899 let mut best = encode_argb_literals_with_width(pixels, width);
13900
13901 let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13902 let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13903 let pred_block = 1u32 << pred_size_bits;
13904 let ctx_block = 1u32 << ctx_size_bits;
13905
13906 if width >= pred_block && height >= pred_block {
13907 let mut pred_single_block_size_bits: u8 = pred_size_bits;
13908 while pred_single_block_size_bits < 9
13909 && ((1u32 << pred_single_block_size_bits) < width
13910 || (1u32 << pred_single_block_size_bits) < height)
13911 {
13912 pred_single_block_size_bits += 1;
13913 }
13914 let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13915 let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13916 encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13917 })];
13918 let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
13919 for slack in [
13920 pred_block_pixels,
13921 2 * pred_block_pixels,
13922 4 * pred_block_pixels,
13923 ] {
13924 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13925 encode_with_predictor_slack(
13926 pixels,
13927 width,
13928 height,
13929 pred_size_bits,
13930 cache_bits,
13931 width,
13932 slack,
13933 )
13934 }));
13935 }
13936 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13937 encode_with_predictor_entropy(
13938 pixels,
13939 width,
13940 height,
13941 pred_size_bits,
13942 cache_bits,
13943 width,
13944 )
13945 }));
13946 if try_pred_single_block {
13947 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13948 encode_with_predictor(
13949 pixels,
13950 width,
13951 height,
13952 pred_single_block_size_bits,
13953 cache_bits,
13954 width,
13955 )
13956 }));
13957 let single_pred_block_pixels: u64 =
13958 (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
13959 for slack in [
13960 single_pred_block_pixels,
13961 2 * single_pred_block_pixels,
13962 4 * single_pred_block_pixels,
13963 ] {
13964 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13965 encode_with_predictor_slack(
13966 pixels,
13967 width,
13968 height,
13969 pred_single_block_size_bits,
13970 cache_bits,
13971 width,
13972 slack,
13973 )
13974 }));
13975 }
13976 pred_candidates.push(select_best_cache_bits(|cache_bits| {
13977 encode_with_predictor_entropy(
13978 pixels,
13979 width,
13980 height,
13981 pred_single_block_size_bits,
13982 cache_bits,
13983 width,
13984 )
13985 }));
13986 }
13987 for cand in pred_candidates {
13988 if cand.len() < best.len() {
13989 best = cand;
13990 }
13991 }
13992 }
13993
13994 if width >= ctx_block && height >= ctx_block {
13995 let mut single_block_size_bits: u8 = ctx_size_bits;
13996 while single_block_size_bits < 9
13997 && ((1u32 << single_block_size_bits) < width
13998 || (1u32 << single_block_size_bits) < height)
13999 {
14000 single_block_size_bits += 1;
14001 }
14002 let try_single_block = single_block_size_bits != ctx_size_bits;
14003 let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
14004 encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
14005 })];
14006 if try_single_block {
14007 candidates.push(select_best_cache_bits(|cache_bits| {
14008 encode_with_color_transform(
14009 pixels,
14010 width,
14011 height,
14012 single_block_size_bits,
14013 cache_bits,
14014 width,
14015 )
14016 }));
14017 }
14018 for cand in candidates {
14019 if cand.len() < best.len() {
14020 best = cand;
14021 }
14022 }
14023 }
14024
14025 if collect_palette(pixels).is_some() {
14026 let ci_best = select_best_cache_bits(|cache_bits| {
14027 encode_with_color_indexing(pixels, width, height, cache_bits)
14028 .expect("palette feasibility already confirmed")
14029 });
14030 if ci_best.len() < best.len() {
14031 best = ci_best;
14032 }
14033 }
14034
14035 if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
14036 if mp_best.len() < best.len() {
14037 best = mp_best;
14038 }
14039 }
14040
14041 best
14042 }
14043
14044 /// Round 162 — `sub_image_mode_cost_delta_milli` returns zero when
14045 /// the first symbol is added to an empty histogram: the post-add
14046 /// state is a single-symbol histogram with `H = 0`, so the
14047 /// Shannon mass goes from 0 (degenerate) to 0 (single bin with
14048 /// `c·log2(N/c) = N·log2(1) = 0`).
14049 #[test]
14050 fn round_162_sub_image_mode_cost_delta_zero_on_first_add() {
14051 let hist = [0u32; 14];
14052 for mode in 0u8..=13 {
14053 let delta = sub_image_mode_cost_delta_milli(&hist, 0, mode);
14054 assert_eq!(
14055 delta, 0,
14056 "first symbol add must produce zero Shannon delta; mode={mode} delta={delta}"
14057 );
14058 }
14059 }
14060
14061 /// Round 162 — `sub_image_mode_cost_delta_milli` returns zero when
14062 /// the added symbol equals the only mode already present (still a
14063 /// single-symbol histogram post-add), and a strictly positive
14064 /// delta when the added symbol is *different* from the only mode
14065 /// already present (the histogram grows from one to two bins, so
14066 /// `N·H` grows from `0` to `2·log2(2) - 2·1·log2(1) = 2` bits).
14067 #[test]
14068 fn round_162_sub_image_mode_cost_delta_grows_on_new_symbol() {
14069 // Start with five occurrences of mode 3 already in the
14070 // histogram (single-symbol state, N·H = 0).
14071 let mut hist = [0u32; 14];
14072 hist[3] = 5;
14073 let total = 5u32;
14074
14075 let same = sub_image_mode_cost_delta_milli(&hist, total, 3);
14076 assert_eq!(
14077 same, 0,
14078 "adding same symbol to a single-mode histogram must not grow Shannon mass"
14079 );
14080
14081 let different = sub_image_mode_cost_delta_milli(&hist, total, 7);
14082 assert!(
14083 different > 0,
14084 "adding a new symbol to a single-mode histogram must grow Shannon mass; got 0"
14085 );
14086 // Sanity: the post-add N·H is 6·log2(6) − 5·log2(5) − 1·log2(1)
14087 // ≈ 15.5097 − 11.6096 − 0 ≈ 3.9 bits ≈ 3900 milli-bits.
14088 // Pre-add was 0, so the delta should be roughly 3900 ±1.
14089 assert!(
14090 (3500..=4300).contains(&different),
14091 "expected delta near 3900 milli-bits; got {different}"
14092 );
14093 }
14094
14095 /// Round 162 — `lambda_milli == 0` makes the sub-image-aware
14096 /// chooser byte-identical to the round-161 entropy chooser: every
14097 /// candidate's joint cost equals its residual-only cost (the
14098 /// sub-image term contributes zero), and the tie-break rules
14099 /// match exactly.
14100 #[test]
14101 fn round_162_lambda_zero_byte_identical_to_round_161() {
14102 // Use a 32×32 fixture exercising the per-region path with at
14103 // least four 16×16 blocks worth of sub-image entries.
14104 let w = 32u32;
14105 let h = 32u32;
14106 let mut pixels = vec![0u32; (w * h) as usize];
14107 for y in 0..h as usize {
14108 for x in 0..w as usize {
14109 let r = (x as u8).wrapping_mul(7);
14110 let g = (y as u8).wrapping_mul(11);
14111 let b = ((x + y) as u8).wrapping_mul(13);
14112 pixels[y * w as usize + x] =
14113 0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | (b as u32);
14114 }
14115 }
14116
14117 let r161 = encode_with_predictor_entropy(&pixels, w, h, 4, None, w);
14118 let r162_lambda0 = encode_with_predictor_entropy_subaware(&pixels, w, h, 4, None, w, 0);
14119 assert_eq!(
14120 r161, r162_lambda0,
14121 "lambda_milli == 0 must produce a byte-identical stream to round-161 entropy"
14122 );
14123
14124 // Also covers Some(cache_bits) — the cache path shouldn't
14125 // alter the equivalence.
14126 let r161_cached = encode_with_predictor_entropy(&pixels, w, h, 4, Some(6), w);
14127 let r162_cached_lambda0 =
14128 encode_with_predictor_entropy_subaware(&pixels, w, h, 4, Some(6), w, 0);
14129 assert_eq!(
14130 r161_cached, r162_cached_lambda0,
14131 "lambda_milli == 0 must be byte-identical with cache_bits = Some(6)"
14132 );
14133 }
14134
14135 /// Round 162 — `pick_block_mode_with_hint_entropy_subaware` honours
14136 /// the strict tie-break: when the preferred mode's joint cost
14137 /// equals the best, the chooser returns the preferred mode (so
14138 /// the sub-image keeps the longer mode-run). Mirrors the round-
14139 /// 159 / round-161 tie-break test.
14140 #[test]
14141 fn round_162_pick_block_mode_subaware_honours_tie() {
14142 // Tiny 1×1 block — every mode reduces to the top-left border
14143 // (`pred = 0xff_00_00_00`), so all modes yield zero residual
14144 // entropy and tie at zero. The hint should flip the result.
14145 let pixels = vec![0xff_00_00_00u32; 1];
14146 let hist = [0u32; 14];
14147 let chosen_no_hint = pick_block_mode_with_hint_entropy_subaware(
14148 &pixels, 1, 1, 0, 0, 1, 1, None, &hist, 0, 4_000,
14149 );
14150 assert_eq!(
14151 chosen_no_hint, 0,
14152 "no-hint pick should fall back to lowest-tied mode (= 0)"
14153 );
14154
14155 for hint in 0u8..=13 {
14156 let chosen = pick_block_mode_with_hint_entropy_subaware(
14157 &pixels,
14158 1,
14159 1,
14160 0,
14161 0,
14162 1,
14163 1,
14164 Some(hint),
14165 &hist,
14166 0,
14167 4_000,
14168 );
14169 assert_eq!(
14170 chosen, hint,
14171 "hint {hint} should win on a fully-tied block; got {chosen}"
14172 );
14173 }
14174 }
14175
14176 /// Round 162 — end-to-end round-trip: the sub-image-aware encoder
14177 /// produces a stream the §5.x decoder reconstructs to the
14178 /// original pixels at three lambda settings and two cache-bits
14179 /// settings, across a small fixture with mixed local statistics.
14180 #[test]
14181 fn round_162_subaware_round_trips_through_decoder() {
14182 let w = 32u32;
14183 let h = 32u32;
14184 let mut pixels = vec![0u32; (w * h) as usize];
14185 // Top-left 16×16: gradient. Top-right: noise. Bottom-left:
14186 // solid. Bottom-right: vertical bars. Drives different
14187 // per-block best modes across the four sub-image entries.
14188 for y in 0..h as usize {
14189 for x in 0..w as usize {
14190 let v = match (x < 16, y < 16) {
14191 (true, true) => 0xff_00_00_00 | (((x + y) as u32 * 8) << 8),
14192 (false, true) => {
14193 let seed = (x.wrapping_mul(97) ^ y.wrapping_mul(53)) as u32;
14194 0xff_00_00_00 | ((seed & 0xff) << 16) | (seed & 0xff00)
14195 }
14196 (true, false) => 0xff_80_80_80,
14197 (false, false) => {
14198 if x % 2 == 0 {
14199 0xff_ff_ff_ff
14200 } else {
14201 0xff_00_00_00
14202 }
14203 }
14204 };
14205 pixels[y * w as usize + x] = v;
14206 }
14207 }
14208
14209 for lambda_milli in [1_000u64, 4_000u64, 16_000u64] {
14210 for cache_bits in [None, Some(4u32), Some(8u32)] {
14211 let payload = encode_with_predictor_entropy_subaware(
14212 &pixels,
14213 w,
14214 h,
14215 4,
14216 cache_bits,
14217 w,
14218 lambda_milli,
14219 );
14220 let header = build_image_header(w, h, true);
14221 let mut bytes = header.to_vec();
14222 bytes.extend_from_slice(&payload);
14223 let framed = build::build_webp_file(&bytes, ImageKind::Lossless, w, h).unwrap();
14224 let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14225 assert_eq!(
14226 decoded.pixels(),
14227 pixels.as_slice(),
14228 "round-trip mismatch lambda_milli={lambda_milli} cache_bits={cache_bits:?}"
14229 );
14230 }
14231 }
14232 }
14233
14234 /// Round 162 — the production chooser never regresses against the
14235 /// round-161 baseline: across 5 image shapes × 3 fixture
14236 /// generators, the round-162 chooser output is byte-`<=` the
14237 /// chooser-without-round-162-candidates output, AND every
14238 /// chosen stream round-trips through the decoder bit-exactly.
14239 #[test]
14240 fn round_162_chooser_never_regresses_vs_round_161() {
14241 let shapes: &[(u32, u32)] = &[(16, 16), (24, 32), (32, 24), (48, 48), (64, 32)];
14242 for &(w, h) in shapes {
14243 for fixture_kind in 0..3u32 {
14244 let mut pixels = vec![0u32; (w * h) as usize];
14245 for y in 0..h as usize {
14246 for x in 0..w as usize {
14247 let v = match fixture_kind {
14248 0 => 0xff_00_00_00 | (((x ^ y) as u32 * 3) & 0xff),
14249 1 => {
14250 let seed =
14251 (x.wrapping_mul(2654435761).wrapping_add(y) & 0xff) as u32;
14252 0xff_00_00_00 | (seed << 16) | seed
14253 }
14254 _ => {
14255 if (x + y) % 5 < 2 {
14256 0xff_a0_a0_a0
14257 } else {
14258 0xff_60_60_60
14259 }
14260 }
14261 };
14262 pixels[y * w as usize + x] = v;
14263 }
14264 }
14265
14266 let baseline = encode_argb_with_predictor_chooser_no_r162_subaware(&pixels, w, h);
14267 let r162 = encode_argb_with_predictor_chooser(&pixels, w, h);
14268 assert!(
14269 r162.len() <= baseline.len(),
14270 "round-162 chooser regressed at shape={w}×{h} fixture={fixture_kind}: \
14271 baseline={} B r162={} B",
14272 baseline.len(),
14273 r162.len()
14274 );
14275
14276 // Decode round-trip on the round-162 stream. The
14277 // chooser emits a bare VP8L payload; wrap with the
14278 // image header before framing.
14279 let header = build_image_header(w, h, true);
14280 let mut payload = header.to_vec();
14281 payload.extend_from_slice(&r162);
14282 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
14283 let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14284 assert_eq!(
14285 decoded.pixels(),
14286 pixels.as_slice(),
14287 "round-trip mismatch at shape={w}×{h} fixture={fixture_kind}"
14288 );
14289 }
14290 }
14291 }
14292
14293 /// Round 162 — the *isolated* sub-image-aware predictor candidate
14294 /// (`encode_with_predictor_entropy_subaware`) strictly beats the
14295 /// round-161 isolated entropy candidate
14296 /// (`encode_with_predictor_entropy`) on every smooth-gradient
14297 /// fixture in the sweep. This is the headline empirical result
14298 /// for the round-162 cost model: smooth gradients are the
14299 /// canonical case where many §4.1 sub-image entries can converge
14300 /// onto a small mode set (the gradient predictors all yield
14301 /// near-zero residuals so the sub-image's prefix-code mass
14302 /// dominates total cost). The crossover at the swept lambda
14303 /// values (`64_000` per-sub-image-bit milli-units) is where the
14304 /// sub-image weighting takes off — below that, residual cost
14305 /// dominates and the round-161 chooser already wins.
14306 ///
14307 /// This compares the round-162 and round-161 predictor
14308 /// candidates **in isolation** (same `size_bits = 4`, both
14309 /// running through `apply_forward_predictor` + LZ77 + prefix
14310 /// coding) so the win is attributable to the chooser, not to
14311 /// other paths in the full chooser sweep (subtract-green,
14312 /// single-block predictor, etc.) which may produce an equally-
14313 /// tight stream by a different mechanism. The production chooser
14314 /// adds the round-162 candidate to its sweep and keeps byte-
14315 /// shortest, so even when other paths tie, the round-162 path
14316 /// strictly extends the encoder's option set.
14317 ///
14318 /// Round-trips through the decoder bit-exactly on every winning
14319 /// fixture.
14320 #[test]
14321 fn round_162_subaware_isolated_strictly_beats_round_161_on_some_fixture() {
14322 let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (256, 128), (96, 96), (160, 80)];
14323 let lambda_to_test: u64 = 64_000;
14324 let mut wins = 0u32;
14325 let mut max_savings: i64 = 0;
14326 let mut max_savings_shape: (u32, u32) = (0, 0);
14327 for &(w, h) in shapes {
14328 let mut pixels = vec![0u32; (w * h) as usize];
14329 for y in 0..h {
14330 for x in 0..w {
14331 let r = (x * 255 / w.max(1)) as u8;
14332 let g = (y * 255 / h.max(1)) as u8;
14333 pixels[(y * w + x) as usize] =
14334 0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | 0x40;
14335 }
14336 }
14337 let r161 = encode_with_predictor_entropy(&pixels, w, h, 4, None, w);
14338 let r162 =
14339 encode_with_predictor_entropy_subaware(&pixels, w, h, 4, None, w, lambda_to_test);
14340 // r162 may tie r161 on some shapes (the chosen mode set
14341 // already coincides), but it must never regress — the
14342 // sub-image-aware cost is a strict generalisation of the
14343 // round-161 cost.
14344 assert!(
14345 r162.len() <= r161.len(),
14346 "round-162 isolated candidate REGRESSED on gradient {w}x{h}: \
14347 r161={} B r162={} B",
14348 r161.len(),
14349 r162.len()
14350 );
14351 let saved = r161.len() as i64 - r162.len() as i64;
14352 if r162.len() < r161.len() {
14353 wins += 1;
14354 if saved > max_savings {
14355 max_savings = saved;
14356 max_savings_shape = (w, h);
14357 }
14358 // Verify round-trip on the winning stream.
14359 let header = build_image_header(w, h, true);
14360 let mut payload = header.to_vec();
14361 payload.extend_from_slice(&r162);
14362 let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
14363 let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14364 assert_eq!(
14365 decoded.pixels(),
14366 pixels.as_slice(),
14367 "round-trip mismatch on gradient strict-beat {w}x{h}"
14368 );
14369 eprintln!(
14370 "[round-162] isolated strict-beat (gradient {w}x{h}, lambda={lambda_to_test}): \
14371 r161={} B r162={} B saved={saved} B ({:.1}% reduction)",
14372 r161.len(),
14373 r162.len(),
14374 100.0 * saved as f64 / r161.len() as f64
14375 );
14376 } else {
14377 eprintln!(
14378 "[round-162] tie (gradient {w}x{h}, lambda={lambda_to_test}): \
14379 r161={} B r162={} B (no regression)",
14380 r161.len(),
14381 r162.len()
14382 );
14383 }
14384 }
14385 // Require strict wins on a majority of the gradient sweep —
14386 // proves the round-162 cost model is doing real work, not
14387 // just degenerating to the round-161 chooser everywhere.
14388 assert!(
14389 wins >= 3,
14390 "round-162 isolated candidate strictly beat round-161 on only {wins}/{} gradient \
14391 fixtures; expected at least 3 strict wins to demonstrate the sub-image cost is \
14392 doing real work",
14393 shapes.len()
14394 );
14395 eprintln!(
14396 "[round-162] isolated sub-image-aware: {wins}/{} gradient fixtures strict-won; \
14397 headline savings = {max_savings} B on {}x{}",
14398 shapes.len(),
14399 max_savings_shape.0,
14400 max_savings_shape.1
14401 );
14402 }
14403}