Skip to main content

oxideav_webp/
vp8l_encode.rs

1//! VP8L (WebP-Lossless) §3.8 / §3.7 *encoder*.
2//!
3//! This is the writer counterpart of the round-99..111 decoder stack. The
4//! decoder ([`crate::vp8l_transform::decode_lossless`]) walks a VP8L chunk
5//! payload — §3.4 image-header, §3.8.2 transform list, §3.8.3 image data
6//! (color-cache-info, meta-prefix, prefix-codes, LZ77-coded image) — and
7//! produces ARGB pixels. This module produces a VP8L chunk payload from
8//! ARGB pixels, taking the simplest end-to-end path the spec admits:
9//!
10//! * **§3.8.2 optional subtract-green transform** — as of round 120 the
11//!   encoder evaluates both the no-transform and subtract-green paths and
12//!   emits whichever is smaller. The subtract-green transform (`%b1 %b10`
13//!   in the §3.8.2 grammar; transform type 2 per §3.5 Table 1) carries
14//!   no body bits and subtracts the green channel from red and blue
15//!   before the entropy stage, lowering per-pixel red/blue entropy on
16//!   natural images (the spec's §3.5.3 motivation: "this transform is
17//!   redundant, as it can be modeled using the color transform, but since
18//!   there is no additional data here, the subtract green transform can
19//!   be coded using fewer bits"). The other three transforms (predictor
20//!   / color / color-indexing) get their own forward passes in later
21//!   rounds.
22//! * **§5.2.1 / §5.2.3 color cache** — as of round 121 the encoder
23//!   evaluates a color cache alongside the no-cache path and emits
24//!   whichever is smaller. As of round 148 the chooser sweeps every
25//!   §5.2.3 `cache_code_bits ∈ [1..11]` per the spec's allowed range
26//!   (2..=2048-entry caches) and picks the smallest stream, rather
27//!   than the round-121 fixed 256-entry choice. When the cache is
28//!   enabled, the §3.8.3 `color-cache-info` field becomes
29//!   `%b1 code_bits` (1-bit flag + 4-bit `code_bits`), the GREEN
30//!   alphabet grows to `256 + 24 + (1 << code_bits)` symbols, and
31//!   each repeat of a previously-inserted ARGB literal is emitted as
32//!   a §5.2.3 color-cache code `256 + 24 + index` instead of four
33//!   separate ARGB-channel literals.
34//!   Cache state is maintained per §5.2.3: every emitted pixel — literal
35//!   *and* every pixel covered by a §5.2.2 backward-reference copy — is
36//!   re-inserted at its hashed slot
37//!   (`(0x1e35a7bd * argb) >> (32 - code_bits)`). The chooser cross-
38//!   products with subtract-green so the encoder picks the best of
39//!   `(no-tx | subtract-green) × (no-cache | cache)`; on uncorrelated /
40//!   non-repeating content the no-cache no-tx path wins and is kept.
41//! * **Single §3.7.2.2 meta-prefix code** — `meta-prefix` is `%b0`, so one
42//!   [`crate::meta_prefix::PrefixCodeGroup`] of five prefix codes applies
43//!   to the whole image.
44//! * **Literal-only §3.8.3 image data** — every pixel is a §3.7.3 ARGB
45//!   literal (green via prefix code #1, red/blue/alpha via #2/#3/#4). No
46//!   LZ77 backward references are emitted by [`encode_argb_literals`], so
47//!   the distance prefix code (#5) is the single-symbol-0 form the §3.7.2.1.1
48//!   note sanctions ("empty prefix codes can be coded as those containing a
49//!   single symbol 0").
50//!
51//! The result, wrapped by [`encode_webp_lossless`] in the §2.4 RIFF/WEBP
52//! framing (via [`crate::build`]), decodes back to the exact input pixels
53//! through [`crate::decode_webp`] — a pixel-exact round trip.
54//!
55//! ## §3.7.2 prefix-code construction
56//!
57//! For each of the five symbol alphabets the encoder:
58//!
59//! 1. counts symbol frequencies over the data it will emit;
60//! 2. builds a length-limited (≤ [`MAX_CODE_LENGTH`]) canonical
61//!    Huffman code-length assignment from those frequencies
62//!    ([`build_code_lengths`]);
63//! 3. writes the code lengths to the stream with the §3.7.2.1.2 *normal
64//!    code length code* (or the trivial single-symbol form), then writes
65//!    each symbol with the canonical code derived from the lengths.
66//!
67//! The canonical code assignment ([`canonical_codes`]) is the identical
68//! `(length, value)`-ordered rule the decoder's
69//! [`crate::vp8l_prefix::PrefixCode`] reads, so a code emitted here
70//! decodes there bit-for-bit.
71//!
72//! ## §5.2.2 LZ77 backward-reference matching
73//!
74//! As of round 119, [`encode_argb_literals`] runs an optional §5.2.2
75//! backward-reference pass before emitting the image data. A hash-chain
76//! matcher ([`Lz77Matcher`]) finds repeated pixel runs; each run of
77//! `length >= MIN_MATCH` pixels at scan-line distance `D` is emitted as a
78//! §5.2.2 *length + distance code* pair instead of `length` separate ARGB
79//! literals, compressing repetitive images. The match's length is encoded
80//! via the GREEN alphabet's length-prefix symbols (`256 + prefix_code`).
81//!
82//! As of round 130 the encoder picks the **smaller** of two distance-code
83//! forms per backward reference:
84//!
85//! 1. The *scan-line* encoding `distance_code = D + NUM_DISTANCE_MAP_CODES`
86//!    (always valid, was the round-119 default).
87//! 2. Any §5.2.2 *distance map* code `c ∈ 1..=120` whose
88//!    `(xi, yi) = DISTANCE_MAP[c-1]` satisfies `max(xi + yi*W, 1) == D` for
89//!    the image width `W`. These small codes feed the §5.2.2 distance
90//!    prefix code through low-prefix slots (codes `1..=4` use 0 extra bits,
91//!    code `5` uses 1 extra bit) instead of the high-prefix slots that
92//!    `D + 120` for typical row distances would fall into.
93//!
94//! The reconstruction in
95//! [`crate::vp8l_decode::distance_code_to_pixel_distance`] is identical for
96//! both forms (`xi + yi*W` clamped to 1), so round-trips remain bit-exact.
97//! Photo-like content with vertical correlation (every scan-line referring
98//! to the row above) sees a dramatic improvement: a row-distance match on
99//! a 256-wide image goes from prefix 16 (8-ish bits Huffman + 7 extra) to
100//! prefix 0 (1–4 bits Huffman + 0 extra), shrinking the per-match cost by
101//! ~10 bits. The width-aware helper is
102//! [`pixel_distance_to_distance_code`]; the round-119 scan-line-only
103//! form is still used as the chooser's fallback whenever no distance-map
104//! code matches.
105//!
106//! The inverse of the §5.2.2 prefix-value transform ([`value_to_prefix`])
107//! splits a length/distance into its prefix code and extra bits, the exact
108//! counterpart of the decoder's [`crate::vp8l_decode::read_lz77_value`].
109//!
110//! The literal-only path is still available via [`encode_argb_literals_only`]
111//! (used by the size-reduction comparison test); the default
112//! [`encode_argb_literals`] entry point chooses the LZ77 path.
113//!
114//! As of round 163 the matcher applies **four-position lazy matching
115//! with a diminishing-returns guard**: after finding a match
116//! `(L_a, _)` at `pos`, the encoder also probes `pos + 1`, `pos + 2`,
117//! and `pos + 3` (the round-158 depth-3 contract), and then — only
118//! when the running best across those four positions is still shorter
119//! than [`DEPTH4_GUARD_THRESHOLD`] — also probes `pos + 4`. Whichever
120//! of the candidate start positions yields the strictly longest match
121//! wins; the pixels skipped to reach the chosen start are emitted as
122//! literals. The depth-4 guard captures the empirical observation
123//! that once the depth-3 best already covers a length-`THRESHOLD` run,
124//! a fourth-order swap is almost never able to amortise the four
125//! literals it would cost — the depth-4 probe is gated to avoid
126//! spending hash-chain inserts and a `find` call when its expected
127//! marginal payoff is small. This still recovers fourth-order traps
128//! where the leading match at `pos..=pos + 3` is short. The decoder
129//! output is bit-identical for any input — only the token *partition*
130//! shifts (by up to four pixels) — so round-trips remain bit-exact
131//! under any input. See [`tokenize_lz77_inner`] for the shared
132//! `lazy_depth: u32`-toggled implementation (`0` strict-greedy r155
133//! baseline, `1` r156 depth-1, `2` r157 depth-2, `3` r158 depth-3,
134//! `4` r163 guarded depth-4, now the production default).
135//!
136//! ## §4.1 spatial-predictor forward transform
137//!
138//! The encoder also evaluates the §4.1 predictor transform path: the
139//! image is divided into `(1 << DEFAULT_PREDICTOR_SIZE_BITS)`-pixel
140//! square blocks; each block picks the prediction mode `0..=13` that
141//! minimises a residual-magnitude proxy (sum of per-channel
142//! `|residual|` folded onto `[-128, 127]`) over the block's pixels.
143//! As of round 159, the chooser also threads an
144//! **entropy-image-aware tie-break** through the per-block walk:
145//! when multiple modes tie on residual cost, the chooser prefers
146//! the mode chosen by the *previous neighbour* block (left-of in
147//! the current row, or top-of for the left-column blocks). The
148//! predictor sub-image is written as a §7.2 `entropy-coded-image`,
149//! so adjacent blocks carrying the same mode value reduce that
150//! sub-image's symbol entropy and the bytes the writer emits for
151//! it; this matches RFC 9649 §3.5's "transform data can be decided
152//! based on entropy minimization" note. The residuals themselves
153//! are unchanged on tie-equal swaps (the cost was already minimal),
154//! so decoded pixels stay bit-identical. As of round 160 the
155//! chooser also evaluates a **slack-cost variant** of the
156//! tie-break — see [`pick_block_mode_with_hint_slack`] — that
157//! accepts the preferred neighbour mode at a small additive
158//! `slack` budget above the otherwise-best cost, trading a small
159//! residual increase for a strict drop in the sub-image's symbol
160//! entropy. The slack variant is one of four predictor candidates
161//! the production chooser builds per `size_bits` (slack ∈
162//! `{0, block_pixels, 2·block_pixels, 4·block_pixels}`), and the
163//! byte-shortest stream wins — so the slack candidates can only
164//! add options to the chooser's selection set and never regress.
165//! The sub-resolution predictor image is written as a §7.2
166//! `predictor-image = 3BIT entropy-coded-image` and the per-pixel
167//! residuals are then handed to the standard
168//! `spatially-coded-image` writer. As of round 155 the chooser
169//! sweeps two `size_bits` values for the §4.1 predictor: the
170//! default 16×16-pixel blocks (per-region predictor-mode
171//! granularity, good for images whose best-mode varies spatially)
172//! and a maximal single-block transform whose `size_bits` is large
173//! enough that the entire image collapses to one mode (`1 << size`
174//! ≥ max(width, height), so the sub-image is at most 1×1 — the
175//! cheapest possible §4.1 header). Each predictor `size_bits`
176//! candidate uses the round-148 cache-bits sweep (§5.2.3
177//! `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline)
178//! and is cross-compared against the no-tx / subtract-green
179//! candidates; the smallest stream wins. On smooth gradients with
180//! strong spatial correlation, the predictor path's per-pixel
181//! residual entropy is much lower than the raw pixels' entropy,
182//! more than paying for the predictor-image overhead.
183//!
184//! ## §3.5.2 / §4.2 color-transform forward pass
185//!
186//! As of round 147 the encoder also evaluates the §3.5.2 / §4.2
187//! color transform: the image is divided into
188//! `(1 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS)`-pixel square blocks; each
189//! block picks a `(green_to_red, green_to_blue, red_to_blue)` triple
190//! that minimises a residual-magnitude proxy on the red and blue
191//! channels (the green channel is untouched per §3.5.2). The
192//! per-axis sweep is exact because the cost decomposes additively
193//! across channels: `red_residual` depends only on `green_to_red`,
194//! `blue_residual` depends additively on `(green_to_blue,
195//! red_to_blue)`, so the three axes can be optimised independently
196//! over a small candidate grid (see [`CTE_AXIS_CANDIDATES`]). The
197//! sub-resolution color image is written as a §7.2
198//! `color-image = 3BIT entropy-coded-image` (re-using
199//! `write_entropy_coded_image_literals`) and the per-pixel residuals
200//! are then handed to the standard `spatially-coded-image` writer.
201//! Each color-transform `size_bits` candidate uses the round-148
202//! cache-bits sweep (§5.2.3 `cache_code_bits ∈ [1..11]` plus the
203//! disabled-cache baseline) and is cross-compared against the no-tx,
204//! subtract-green, and §4.1 predictor candidates; the smallest stream
205//! wins. On natural images with red/green and blue/green correlation,
206//! the color-transform path concentrates the red/blue residuals near
207//! zero, shrinking the per-channel Huffman codes and further reducing
208//! the chosen stream's size on top of the §4.1 predictor pass.
209//!
210//! ## §4.4 color-indexing transform encoder
211//!
212//! As of round 150 the encoder also evaluates the §4.4 color-indexing
213//! transform: an O(N) palette probe walks `pixels` and bails out
214//! early at >256 unique ARGB values; below that threshold a sorted
215//! palette is built (sorted ARGB-numerically so the §4.4
216//! subtraction-coded color-table deltas concentrate near zero), each
217//! pixel is replaced by its palette index, and indices are bundled
218//! into one byte per the §4.4 table (`width_bits = 3 / 2 / 1 / 0`
219//! for palettes of 1..=2 / 3..=4 / 5..=16 / 17..=256 entries —
220//! packing 8 / 4 / 2 / 1 indices into each green byte respectively).
221//! The bundled image is then handed to the standard
222//! `spatially-coded-image` writer at the subsampled `packed_width =
223//! DIV_ROUND_UP(width, 1 << width_bits)`. The color-indexing
224//! candidate uses the round-148 cache-bits sweep (§5.2.3
225//! `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline) and
226//! is cross-compared against every other candidate; the smallest
227//! stream wins. On palette-ish content (icons, line art, screen
228//! captures) the index-bundling drops the entropy stage's symbol
229//! count by 2..8×, more than paying for the small subtraction-coded
230//! palette-write overhead.
231//!
232//! ## What this module does NOT do
233//!
234//! * No multi-meta-prefix (§6.2.2 entropy image). All candidates use
235//!   a single prefix-code group for the entire image.
236//! * No `oxideav-core` runtime dependency — this module compiles under
237//!   `--no-default-features`.
238
239use crate::build::{self, ImageKind};
240
241/// The largest code length a VP8L canonical prefix code may use (§3.7.2.1.2
242/// stores literal code lengths in `[0..15]`). Mirrors
243/// [`crate::vp8l_prefix::MAX_CODE_LENGTH`].
244pub const MAX_CODE_LENGTH: usize = 15;
245
246/// §3.7.2.1.2 `kCodeLengthCodes`: the 19-symbol code-length-code alphabet.
247pub const NUM_CODE_LENGTH_CODES: usize = 19;
248
249/// §3.7.2.1.2 `kCodeLengthCodeOrder`: the order the (up to 19)
250/// code-length-code lengths are transmitted in. Identical to the decoder's
251/// [`crate::vp8l_prefix::CODE_LENGTH_CODE_ORDER`].
252pub const CODE_LENGTH_CODE_ORDER: [usize; NUM_CODE_LENGTH_CODES] = [
253    17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
254];
255
256/// Errors raised while encoding a VP8L image.
257#[derive(Debug, Clone, PartialEq, Eq)]
258pub enum EncodeError {
259    /// The caller passed an empty pixel buffer, or one whose length does
260    /// not match `width * height * 4`.
261    PixelBufferMismatch {
262        /// Bytes the caller supplied.
263        got: usize,
264        /// Bytes expected (`width * height * 4`).
265        expected: usize,
266    },
267    /// `width` or `height` was zero, or exceeded the §3.4 14-bit field
268    /// maximum of 16384.
269    InvalidDimensions {
270        /// The offending width.
271        width: u32,
272        /// The offending height.
273        height: u32,
274    },
275    /// The RIFF/WEBP framing builder rejected the assembled payload.
276    Build(build::BuildError),
277}
278
279impl From<build::BuildError> for EncodeError {
280    fn from(e: build::BuildError) -> Self {
281        Self::Build(e)
282    }
283}
284
285impl core::fmt::Display for EncodeError {
286    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
287        match self {
288            Self::PixelBufferMismatch { got, expected } => write!(
289                f,
290                "VP8L encode: pixel buffer is {got} bytes, expected {expected} (width*height*4)"
291            ),
292            Self::InvalidDimensions { width, height } => write!(
293                f,
294                "VP8L encode: invalid dimensions {width}x{height} (must be 1..=16384)"
295            ),
296            Self::Build(e) => write!(f, "VP8L encode: RIFF/WEBP framing: {e}"),
297        }
298    }
299}
300
301impl std::error::Error for EncodeError {}
302
303/// §3.4 14-bit `width - 1` / `height - 1` field maximum (1-based 16384).
304const MAX_DIMENSION: u32 = 1 << 14;
305
306/// Least-significant-bit-first bit writer over a growing byte buffer.
307///
308/// The exact inverse of [`crate::vp8l_stream::BitReader`]: bits are packed
309/// LSB-first within each byte and bytes accumulate in stream order. A
310/// multi-bit write lays the value's bit 0 down first, so a subsequent
311/// `read_bits(n)` returns it unchanged.
312#[derive(Debug, Default, Clone)]
313pub struct BitWriter {
314    bytes: Vec<u8>,
315    bit_pos: usize,
316}
317
318impl BitWriter {
319    /// Create an empty bit writer positioned at bit 0.
320    pub fn new() -> Self {
321        Self::default()
322    }
323
324    /// The number of bits written so far.
325    pub fn bit_position(&self) -> usize {
326        self.bit_pos
327    }
328
329    /// Write the low `n` bits of `value` (0 ≤ `n` ≤ 32) LSB-first.
330    ///
331    /// Writing 0 bits is a no-op (mirrors the reader's `read_bits(0)`).
332    pub fn write_bits(&mut self, value: u32, n: usize) {
333        debug_assert!(n <= 32, "write_bits supports up to 32 bits");
334        let mut value = value;
335        for _ in 0..n {
336            let byte_idx = self.bit_pos >> 3;
337            if byte_idx >= self.bytes.len() {
338                self.bytes.push(0);
339            }
340            let bit = (value & 1) as u8;
341            self.bytes[byte_idx] |= bit << (self.bit_pos & 7);
342            self.bit_pos += 1;
343            value >>= 1;
344        }
345    }
346
347    /// Write a single bit.
348    pub fn write_bit(&mut self, bit: bool) {
349        self.write_bits(bit as u32, 1);
350    }
351
352    /// Consume the writer and return the packed bytes (the final partial
353    /// byte is zero-padded in its high bits).
354    pub fn into_bytes(self) -> Vec<u8> {
355        self.bytes
356    }
357}
358
359/// Build a length-limited (≤ [`MAX_CODE_LENGTH`]) canonical Huffman
360/// code-length assignment for an alphabet of `freqs.len()` symbols.
361///
362/// Returns a `Vec<u8>` of code lengths, one per symbol (0 = symbol unused).
363/// The construction guarantees the §3.7.2 completeness invariant the
364/// decoder enforces — the Kraft sum of `2^-len` over used symbols equals
365/// exactly one — for every input with at least two used symbols, and it
366/// produces the §3.7.2.1.2 single-leaf form (one symbol at length 1) for an
367/// input with exactly one used symbol.
368///
369/// The algorithm is a textbook Huffman build, followed by a
370/// length-limiting pass that caps any over-long code at
371/// [`MAX_CODE_LENGTH`] while re-balancing so the Kraft sum stays at
372/// exactly one. For the small alphabets and pixel counts this encoder
373/// targets, the cap is rarely hit; the pass is correctness insurance,
374/// not an optimization.
375///
376/// The merge loop exploits the classic two-queue property instead of a
377/// heap: with the leaves sorted ascending by `(frequency, symbol)` once
378/// up front, every internal node is created with a frequency no smaller
379/// than any previously created one, so a plain FIFO of internal nodes
380/// stays sorted by `(frequency, creation order)` for free. Each merge
381/// step then takes the two smallest nodes by comparing the two queue
382/// fronts in O(1) — preferring the leaf on a frequency tie, because the
383/// tie-break order ranks every leaf (ascending symbol) before every
384/// internal node (creation order). This reproduces, merge for merge, the
385/// exact `(freq, order)` pop sequence the previous min-heap build used,
386/// so the emitted length tables are bit-identical; only the cost drops
387/// (O(n log n) sort + O(n) merge, versus 3(n-1) heap operations of
388/// O(log n) swaps each).
389pub fn build_code_lengths(freqs: &[u32]) -> Vec<u8> {
390    let n = freqs.len();
391    let mut lengths = vec![0u8; n];
392
393    // Collect used symbols.
394    let used: Vec<usize> = (0..n).filter(|&s| freqs[s] > 0).collect();
395    match used.len() {
396        0 => return lengths, // empty code; caller encodes single-symbol-0.
397        1 => {
398            // §3.7.2.1.2 single-leaf: one symbol marked length 1.
399            lengths[used[0]] = 1;
400            return lengths;
401        }
402        _ => {}
403    }
404
405    // Huffman build. Nodes 0..n are leaves; internal nodes n.. are
406    // appended in creation order. A parent array recovers the depth
407    // (= code length) of each leaf afterwards.
408    let m = used.len();
409
410    // Leaf queue: `(freq << 32) | symbol` keys, sorted ascending. The
411    // packed key makes the sort a single-u64 comparison while encoding
412    // exactly the `(freq, ascending symbol)` tie-break the merge needs
413    // (`freq` is `u32`, so the shift is exact, and a symbol index always
414    // fits the low half).
415    let mut leaves: Vec<u64> = used
416        .iter()
417        .map(|&s| ((freqs[s] as u64) << 32) | s as u64)
418        .collect();
419    leaves.sort_unstable();
420
421    // Internal-node FIFO: frequencies only; internal node `i` has node
422    // index `n + i`. `u32::MAX` marks "no parent yet" (only the root
423    // keeps it, and the root's slot is never read back).
424    let mut inode_freq: Vec<u64> = Vec::with_capacity(m - 1);
425    let mut parent: Vec<u32> = vec![u32::MAX; n + m - 1];
426
427    /// Take the smallest remaining node by `(freq, tie-break order)`:
428    /// the front leaf wins ties because leaves rank before internal
429    /// nodes in the tie-break order.
430    fn take_min(
431        leaves: &[u64],
432        li: &mut usize,
433        inode_freq: &[u64],
434        ii: &mut usize,
435        n: usize,
436    ) -> (usize, u64) {
437        let use_leaf = if *li < leaves.len() {
438            *ii >= inode_freq.len() || (leaves[*li] >> 32) <= inode_freq[*ii]
439        } else {
440            false
441        };
442        if use_leaf {
443            let key = leaves[*li];
444            *li += 1;
445            ((key & 0xffff_ffff) as usize, key >> 32)
446        } else {
447            let node = n + *ii;
448            let freq = inode_freq[*ii];
449            *ii += 1;
450            (node, freq)
451        }
452    }
453
454    let mut li = 0usize; // leaf cursor
455    let mut ii = 0usize; // internal-node cursor
456    for _ in 0..m - 1 {
457        let (a_node, a_freq) = take_min(&leaves, &mut li, &inode_freq, &mut ii, n);
458        let (b_node, b_freq) = take_min(&leaves, &mut li, &inode_freq, &mut ii, n);
459        let new_node = n + inode_freq.len();
460        parent[a_node] = new_node as u32;
461        parent[b_node] = new_node as u32;
462        inode_freq.push(a_freq + b_freq);
463    }
464
465    // Recover each leaf's depth top-down: an internal node's parent is
466    // always created later (larger index), so a single reverse pass over
467    // the internal nodes settles every internal depth, and each leaf is
468    // then one deeper than its (always internal) parent.
469    let mut internal_depth = vec![0u32; m - 1];
470    for i in (0..m - 1).rev() {
471        let p = parent[n + i];
472        if p != u32::MAX {
473            internal_depth[i] = internal_depth[p as usize - n] + 1;
474        }
475    }
476    let mut max_len = 0usize;
477    for &s in &used {
478        let depth = internal_depth[parent[s] as usize - n] as usize + 1;
479        // A single internal-node tree (two leaves) gives depth 1; never 0
480        // here because used.len() >= 2.
481        lengths[s] = depth as u8;
482        max_len = max_len.max(depth);
483    }
484
485    if max_len > MAX_CODE_LENGTH {
486        limit_code_lengths(&mut lengths, &used);
487    }
488
489    lengths
490}
491
492/// Cap every code length at [`MAX_CODE_LENGTH`] while keeping the Kraft sum
493/// exactly 1, using the standard "move a too-long leaf up and lengthen a
494/// short leaf to compensate" rebalancing pass.
495///
496/// This is the approach a length-limited Huffman post-pass uses when a
497/// pathological frequency distribution would otherwise need codes longer
498/// than the format allows. It produces a *valid* (complete) code that is at
499/// most marginally sub-optimal; exactness of the round trip is unaffected
500/// because the decoder reconstructs pixels from whatever complete code the
501/// lengths describe.
502fn limit_code_lengths(lengths: &mut [u8], used: &[usize]) {
503    limit_code_lengths_to(lengths, used, MAX_CODE_LENGTH);
504}
505
506/// As [`limit_code_lengths`], but caps every code length at the
507/// caller-supplied `max_len` rather than [`MAX_CODE_LENGTH`].
508///
509/// The §3.7.2.1.2 *code-length-code* (the meta-code that transmits the
510/// literal length table) writes each of its own lengths in a **3-bit**
511/// on-wire field, so its lengths must not exceed `7` — a constraint
512/// tighter than the 15-bit `MAX_CODE_LENGTH` ceiling that applies to the
513/// literal codes themselves. A skewed enough CLC frequency histogram
514/// (one length value vastly more common than the rest) makes the plain
515/// Huffman build assign a length-8-or-more code to a rare CLC symbol;
516/// without this cap the 3-bit field silently truncates it, corrupting the
517/// table into an incomplete (Kraft < 1) code the decoder rejects. Capping
518/// the CLC at 7 with a Kraft re-balance keeps the on-wire table valid.
519///
520/// `max_len <= MAX_CODE_LENGTH` is required (the Kraft arithmetic uses
521/// `2^max_len` as the common denominator).
522fn limit_code_lengths_to(lengths: &mut [u8], used: &[usize], max_len: usize) {
523    debug_assert!((1..=MAX_CODE_LENGTH).contains(&max_len));
524    // Clamp.
525    for &s in used {
526        if lengths[s] as usize > max_len {
527            lengths[s] = max_len as u8;
528        }
529    }
530    // Kraft sum over denominator 2^max_len.
531    let full: i64 = 1i64 << max_len;
532    let kraft = |lengths: &[u8]| -> i64 {
533        let mut k = 0i64;
534        for &s in used {
535            let l = lengths[s] as usize;
536            if l > 0 {
537                k += 1i64 << (max_len - l);
538            }
539        }
540        k
541    };
542    // If over-subscribed (sum > 1), lengthen the deepest (largest-length,
543    // i.e. cheapest-to-lengthen) leaves until the sum drops to 1.
544    //
545    // Selection rule being reproduced: the historical per-step rescan
546    // walked all of `used` and kept the LAST `used`-order symbol among
547    // those sharing the largest current length below the cap (the
548    // `l >= best_len` comparison kept updating on ties). Two facts turn
549    // that O(n)-per-adjustment rescan into an O(1)-per-adjustment bucket
550    // drain with the identical pick sequence:
551    //
552    // 1. A bucket per length, filled in one pass over `used`, holds each
553    //    bucket's symbols in `used` order — so the back of the highest
554    //    non-empty bucket IS the rescan's pick.
555    // 2. Once a pick is lengthened from `l` to `l + 1 < MAX`, it is
556    //    strictly the unique deepest eligible leaf (everything else is
557    //    `<= l`), so the rescan re-picks the same symbol every step
558    //    until it reaches MAX (leaving the eligible set) or the sum
559    //    reaches 1. Driving the popped symbol upward in place therefore
560    //    replays the original step sequence exactly; no eligible bucket
561    //    ever gains a member while the pass is still running.
562    let mut k = kraft(lengths);
563    if k > full {
564        let mut buckets: Vec<Vec<usize>> = vec![Vec::new(); max_len];
565        for &s in used {
566            let l = lengths[s] as usize;
567            if l < max_len {
568                buckets[l].push(s);
569            }
570        }
571        // Bucket 0 is included for parity with the historical rescan,
572        // which treated a (theoretical) zero-length used symbol as
573        // eligible; the §3.7.2 build never produces one for a used
574        // symbol, so the bucket is empty in practice.
575        'over: for l0 in (0..max_len).rev() {
576            while k > full {
577                let Some(s) = buckets[l0].pop() else { break };
578                // Lengthening from `l` to `l + 1` swaps the Kraft term
579                // `2^(max-l)` for `2^(max-l-1)`, i.e. removes exactly
580                // `2^(max-l-1)` — same integer a full recompute would
581                // give.
582                let mut l = l0;
583                while k > full && l < max_len {
584                    l += 1;
585                    lengths[s] = l as u8;
586                    k -= 1i64 << (max_len - l);
587                }
588            }
589            if k <= full {
590                break 'over;
591            }
592        }
593    }
594    // If under-subscribed (sum < 1), shorten the deepest leaves until the
595    // sum reaches 1.
596    while k < full {
597        let mut target: Option<usize> = None;
598        let mut best_len = 0u8;
599        for &s in used {
600            let l = lengths[s];
601            if l > 1 && l >= best_len {
602                best_len = l;
603                target = Some(s);
604            }
605        }
606        match target {
607            Some(s) => {
608                // Shortening `s` from `l` to `l - 1` swaps `2^(max-l)` for
609                // `2^(max-l+1)`, i.e. adds exactly `2^(max-l)` — again the
610                // same integer a full recompute would give.
611                let l = lengths[s] as usize;
612                lengths[s] -= 1;
613                k += 1i64 << (max_len - l);
614            }
615            None => break,
616        }
617    }
618}
619
620/// Maximum on-wire length for a §3.7.2.1.2 code-length-code symbol: the
621/// CLC lengths are each written in a 3-bit field, so they range `[0..7]`.
622const MAX_CLC_CODE_LENGTH: usize = 7;
623
624/// Build the §3.7.2.1.2 code-length-code (CLC) lengths for a literal
625/// length table, capped at [`MAX_CLC_CODE_LENGTH`] so every length fits
626/// the 3-bit on-wire field. The plain Huffman build can assign a CLC
627/// symbol a length of 8 or more on a skewed histogram; this wrapper
628/// re-balances any such over-long code back under 7 while keeping the
629/// table complete, so both [`write_normal_code_lengths`] and
630/// [`normal_form_bits`] see the same valid lengths.
631fn build_clc_code_lengths(clc_freq: &[u32]) -> Vec<u8> {
632    let mut clc_lengths = build_code_lengths(clc_freq);
633    if clc_lengths
634        .iter()
635        .any(|&l| l as usize > MAX_CLC_CODE_LENGTH)
636    {
637        let used: Vec<usize> = (0..clc_freq.len()).filter(|&s| clc_freq[s] > 0).collect();
638        limit_code_lengths_to(&mut clc_lengths, &used, MAX_CLC_CODE_LENGTH);
639    }
640    clc_lengths
641}
642
643/// Build the canonical code values for a per-symbol length table.
644///
645/// Returns `codes[s]` = the canonical code value for symbol `s` (only
646/// meaningful where `lengths[s] > 0`). The assignment is the same DEFLATE
647/// canonical rule the decoder's [`crate::vp8l_prefix::PrefixCode`] reads:
648/// symbols ordered by `(length, value)`, codes assigned sequentially, read
649/// most-significant-bit-first within a code.
650pub fn canonical_codes(lengths: &[u8]) -> Vec<u32> {
651    let mut bl_count = [0u32; MAX_CODE_LENGTH + 1];
652    for &l in lengths {
653        if l > 0 {
654            bl_count[l as usize] += 1;
655        }
656    }
657    let mut next_code = [0u32; MAX_CODE_LENGTH + 2];
658    let mut code = 0u32;
659    for len in 1..=MAX_CODE_LENGTH {
660        code = (code + bl_count[len - 1]) << 1;
661        next_code[len] = code;
662    }
663    let mut codes = vec![0u32; lengths.len()];
664    let mut assign = next_code;
665    // Indexed by code length to assign sequential canonical codes; mirrors
666    // the decoder's `(length, value)`-ordered assignment.
667    #[allow(clippy::needless_range_loop)]
668    for len in 1..=MAX_CODE_LENGTH {
669        for (sym, &l) in lengths.iter().enumerate() {
670            if l as usize == len {
671                codes[sym] = assign[len];
672                assign[len] += 1;
673            }
674        }
675    }
676    codes
677}
678
679/// §5.2.2: split a length/distance `value` (≥ 1) into its *prefix code* and
680/// *extra bits*, the exact inverse of the decoder's
681/// [`crate::vp8l_decode::read_lz77_value`].
682///
683/// Returns `(prefix_code, extra_bits, extra_value)` where:
684///
685/// * `prefix_code` is the entropy-coded symbol (a GREEN length symbol is
686///   `256 + prefix_code`; a distance symbol is `prefix_code` directly),
687/// * `extra_bits` is how many raw bits follow the prefix code,
688/// * `extra_value` is the value those `extra_bits` carry (LSB-first, as the
689///   decoder's `ReadBits` consumes them).
690///
691/// The decoder reconstructs `value` as:
692///
693/// ```text
694/// if prefix_code < 4 { value = prefix_code + 1 }
695/// else {
696///     extra_bits = (prefix_code - 2) >> 1
697///     offset = (2 + (prefix_code & 1)) << extra_bits
698///     value = offset + extra_value + 1
699/// }
700/// ```
701///
702/// so feeding `extra_value` back through that formula yields `value`.
703pub fn value_to_prefix(value: u32) -> (u32, u32, u32) {
704    debug_assert!(value >= 1, "LZ77 length/distance values are 1-based");
705    if value <= 4 {
706        // prefix_code = value - 1; no extra bits (the `< 4` decoder branch).
707        return (value - 1, 0, 0);
708    }
709    // value >= 5. Find the prefix code p (>= 4) whose range
710    // [offset+1, offset + 2^extra_bits] contains `value`, where
711    // extra_bits = (p - 2) >> 1 and offset = (2 + (p & 1)) << extra_bits.
712    //
713    // Equivalently: let v0 = value - 1 (>= 4). The high bit of v0 selects
714    // the magnitude; the next bit selects the (p & 1) parity sub-band.
715    let v0 = value - 1; // >= 4
716                        // `msb` = floor(log2(v0)) >= 2.
717    let msb = 31 - v0.leading_zeros();
718    let extra_bits = msb - 1;
719    // Parity bit: the bit just below the MSB distinguishes the two
720    // sub-bands offset = 2<<e (parity 0) vs offset = 3<<e (parity 1).
721    let parity = (v0 >> (msb - 1)) & 1;
722    let prefix_code = 2 * extra_bits + 2 + parity;
723    let offset = (2 + parity) << extra_bits;
724    let extra_value = value - offset - 1;
725    debug_assert!(extra_value < (1u32 << extra_bits));
726    (prefix_code, extra_bits, extra_value)
727}
728
729/// A built prefix code ready for symbol emission: per-symbol length + code.
730#[derive(Debug, Clone)]
731struct WriteCode {
732    lengths: Vec<u8>,
733    codes: Vec<u32>,
734    /// `Some(sym)` when this is the single-leaf form (one symbol, length 1).
735    single: Option<usize>,
736}
737
738impl WriteCode {
739    /// Build a [`WriteCode`] from symbol frequencies over an alphabet of
740    /// `alphabet_size` symbols.
741    fn from_freqs(freqs: &[u32]) -> Self {
742        let used: Vec<usize> = (0..freqs.len()).filter(|&s| freqs[s] > 0).collect();
743        let single = if used.len() == 1 { Some(used[0]) } else { None };
744        let lengths = build_code_lengths(freqs);
745        let codes = canonical_codes(&lengths);
746        Self {
747            lengths,
748            codes,
749            single,
750        }
751    }
752
753    /// An *empty* code: encoded per §3.7.2.1.1's note as a single symbol 0.
754    /// Used for the distance code when no backward references are emitted.
755    fn empty(alphabet_size: usize) -> Self {
756        let mut freqs = vec![0u32; alphabet_size];
757        freqs[0] = 1;
758        Self::from_freqs(&freqs)
759    }
760
761    /// Emit one symbol's code to `w` (MSB-first within the code, matching
762    /// the canonical assignment the decoder reads). For the single-leaf
763    /// form this writes nothing (reading consumes no bits).
764    fn write_symbol(&self, w: &mut BitWriter, symbol: usize) {
765        if self.single.is_some() {
766            return; // single-leaf code: 0 bits.
767        }
768        let len = self.lengths[symbol] as usize;
769        let code = self.codes[symbol];
770        // The decoder reads MSB-first within the code, so emit the high bit
771        // first. write_bits is LSB-first, so reverse the `len` low bits.
772        for i in 0..len {
773            let bit = (code >> (len - 1 - i)) & 1;
774            w.write_bits(bit, 1);
775        }
776    }
777
778    /// Write this code's per-symbol lengths to `w`, picking the cheaper
779    /// of the two §3.7.2.1 forms.
780    ///
781    /// The §3.7.2.1.1 *simple code length code* can only represent length
782    /// tables with 1 or 2 symbols at length 1 (every other symbol
783    /// implicitly absent). When that constraint holds, `write_code_lengths`
784    /// computes the precise bit-cost of both forms and picks the smaller.
785    /// Otherwise it falls back to the §3.7.2.1.2 *normal code length code*.
786    fn write_code_lengths(&self, w: &mut BitWriter) {
787        if let Some(simple) = self.as_simple_form() {
788            // Two trivial cases the simple form can carry — compare
789            // bit-costs and pick the cheaper.
790            let simple_bits = simple_form_bits(&simple);
791            let normal_bits = normal_form_bits(&self.lengths);
792            if simple_bits <= normal_bits {
793                write_simple_code_lengths(w, &simple);
794                return;
795            }
796        }
797        write_normal_code_lengths(w, &self.lengths);
798    }
799
800    /// If this code's length table is encodable with the §3.7.2.1.1 simple
801    /// form (1 or 2 symbols at length 1, all others 0), return the symbol
802    /// list `[symbol0]` or `[symbol0, symbol1]`. Otherwise return `None`.
803    fn as_simple_form(&self) -> Option<Vec<usize>> {
804        let used: Vec<(usize, u8)> = self
805            .lengths
806            .iter()
807            .enumerate()
808            .filter_map(|(s, &l)| if l != 0 { Some((s, l)) } else { None })
809            .collect();
810        // Simple form requires 1 or 2 used symbols, each at length 1.
811        // §3.7.2.1.1: "code length 1. All other prefix code lengths are
812        // implicitly zeros."
813        if used.is_empty() || used.len() > 2 {
814            return None;
815        }
816        if used.iter().any(|&(_, l)| l != 1) {
817            return None;
818        }
819        // §3.7.2.1.1 first symbol is coded with 1 or 8 bits, so it must
820        // fit in [0..255]; second symbol always 8 bits, [0..255]. Anything
821        // beyond 255 can only be sent via the normal form.
822        if used.iter().any(|&(s, _)| s > 255) {
823            return None;
824        }
825        Some(used.iter().map(|&(s, _)| s).collect())
826    }
827}
828
829/// Precise bit-cost of the §3.7.2.1.1 *simple code length code* for the
830/// given symbol list (1 or 2 entries, each in `[0..255]`).
831///
832/// Layout per §3.7.2.1.1:
833/// * 1 flag bit (`1` = simple)
834/// * 1 bit `num_symbols - 1`
835/// * 1 bit `is_first_8bits` (chooses 1-bit vs 8-bit width for symbol0)
836/// * `1 + 7 * is_first_8bits` bits for `symbol0`
837/// * if `num_symbols == 2`: 8 bits for `symbol1`
838fn simple_form_bits(symbols: &[usize]) -> usize {
839    debug_assert!(symbols.len() == 1 || symbols.len() == 2);
840    let is_first_8bits = symbols[0] > 1;
841    // Per spec: the second symbol, when present, is always 8 bits.
842    let s0_width = if is_first_8bits { 8 } else { 1 };
843    let s1_width = if symbols.len() == 2 { 8 } else { 0 };
844    // 1 (flag) + 1 (num_symbols-1) + 1 (is_first_8bits) + s0 + s1.
845    3 + s0_width + s1_width
846}
847
848/// Precise bit-cost of [`write_normal_code_lengths`] for `lengths`.
849///
850/// Mirrors `write_normal_code_lengths` exactly so the chooser is
851/// self-consistent: any change in normal-form layout there must reflect
852/// here.
853fn normal_form_bits(lengths: &[u8]) -> usize {
854    // CLC frequencies are the histogram of length values 0..=15 in the
855    // literal length table.
856    let mut clc_freq = [0u32; NUM_CODE_LENGTH_CODES];
857    for &l in lengths {
858        clc_freq[l as usize] += 1;
859    }
860    let clc_lengths = build_clc_code_lengths(&clc_freq);
861
862    // Locate the highest-ordered CLC symbol that has a non-zero length.
863    let mut max_order_used = 0usize;
864    for (order_idx, &pos) in CODE_LENGTH_CODE_ORDER.iter().enumerate() {
865        if clc_lengths[pos] != 0 {
866            max_order_used = order_idx;
867        }
868    }
869    let num_code_lengths = (max_order_used + 1).max(4);
870
871    // §3.7.2.1.2 header tax: 1 flag + 4 num_code_lengths + 3*num_code_lengths
872    // CLC lengths + 1 max_symbol gate.
873    let mut bits = 1 + 4 + 3 * num_code_lengths + 1;
874
875    // Per-symbol body: when the CLC collapses to a single non-zero
876    // length (single-leaf CLC), the decoder consumes 0 bits per symbol
877    // and the writer emits nothing. Otherwise emit the canonical code for
878    // each literal length value.
879    let used_clc: Vec<usize> = (0..NUM_CODE_LENGTH_CODES)
880        .filter(|&s| clc_freq[s] > 0)
881        .collect();
882    if used_clc.len() > 1 {
883        for &l in lengths {
884            bits += clc_lengths[l as usize] as usize;
885        }
886    }
887    bits
888}
889
890/// Write a per-symbol length table with the §3.7.2.1.1 *simple code
891/// length code*.
892///
893/// Only valid for `symbols.len()` in `[1, 2]`, each symbol in `[0..255]`,
894/// each implicitly at code length 1. The caller is responsible for
895/// checking applicability via [`WriteCode::as_simple_form`].
896fn write_simple_code_lengths(w: &mut BitWriter, symbols: &[usize]) {
897    debug_assert!(symbols.len() == 1 || symbols.len() == 2);
898    debug_assert!(symbols.iter().all(|&s| s <= 255));
899
900    // §3.7.2.1.1 flag: 1 selects the simple form.
901    w.write_bit(true);
902    // num_symbols = ReadBits(1) + 1, so write `num_symbols - 1`.
903    w.write_bits((symbols.len() as u32) - 1, 1);
904    // §3.7.2.1.1: "is_first_8bits ... range [0..1] or [0..255]". Choose
905    // the 1-bit form when symbol0 fits in [0..1], else the 8-bit form.
906    let is_first_8bits = symbols[0] > 1;
907    w.write_bits(if is_first_8bits { 1 } else { 0 }, 1);
908    let s0_width = if is_first_8bits { 8 } else { 1 };
909    w.write_bits(symbols[0] as u32, s0_width);
910    if symbols.len() == 2 {
911        // §3.7.2.1.1: "The second symbol, if present, is always assumed
912        // to be in the range [0..255] and coded using 8 bits."
913        w.write_bits(symbols[1] as u32, 8);
914    }
915}
916
917/// Write a per-symbol length table with the §3.7.2.1.2 *normal code length
918/// code*.
919///
920/// The encoder uses the general (non-run-length) form: it transmits one
921/// code-length-code symbol per literal length. To keep the code-length-code
922/// itself trivially decodable, every length value `0..=15` that actually
923/// occurs is given a code-length-code symbol; the CLC is built from the
924/// frequencies of those length values. Runs (codes 16/17/18) are not
925/// emitted — the literal length sequence is sent verbatim, which the
926/// decoder's `read_normal_code_lengths` handles as the `0..=15` literal
927/// branch.
928fn write_normal_code_lengths(w: &mut BitWriter, lengths: &[u8]) {
929    // §3.7.2.1.2: the code-length-code is itself a prefix code over the
930    // 19-symbol alphabet {0..15 literal lengths, 16 repeat, 17/18 zero
931    // runs}. We only emit symbols 0..=15 (no runs), so the CLC alphabet is
932    // those length values that occur in `lengths`.
933    let mut clc_freq = [0u32; NUM_CODE_LENGTH_CODES];
934    for &l in lengths {
935        clc_freq[l as usize] += 1;
936    }
937    let clc_lengths = build_clc_code_lengths(&clc_freq);
938    let clc_codes = canonical_codes(&clc_lengths);
939
940    // num_code_lengths: how many CLC lengths we transmit, in
941    // kCodeLengthCodeOrder. We must transmit enough leading entries to
942    // cover the highest-ordered CLC symbol that has a non-zero length.
943    let mut max_order_used = 0usize;
944    for (order_idx, &pos) in CODE_LENGTH_CODE_ORDER.iter().enumerate() {
945        if clc_lengths[pos] != 0 {
946            max_order_used = order_idx;
947        }
948    }
949    // §3.7.2.1.2: num_code_lengths = 4 + ReadBits(4), range [4..19].
950    let num_code_lengths = (max_order_used + 1).max(4);
951
952    // normal flag bit.
953    w.write_bit(false);
954    // num_code_lengths - 4 in 4 bits.
955    w.write_bits((num_code_lengths - 4) as u32, 4);
956    // The CLC lengths, 3 bits each, in kCodeLengthCodeOrder.
957    for &pos in CODE_LENGTH_CODE_ORDER.iter().take(num_code_lengths) {
958        w.write_bits(clc_lengths[pos] as u32, 3);
959    }
960    // max_symbol gate: ReadBits(1) == 0 → max_symbol = alphabet_size, i.e.
961    // read all `lengths.len()` entries. We always emit the full table.
962    w.write_bit(false);
963
964    // Whether the CLC is a single-leaf code (one length value occurs):
965    // write_symbol then emits 0 bits, and the decoder's CLC reader returns
966    // that lone symbol for every read — which is exactly the literal length
967    // we want, repeated for every symbol. Build a tiny symbol writer.
968    let clc_single = {
969        let used: Vec<usize> = (0..NUM_CODE_LENGTH_CODES)
970            .filter(|&s| clc_freq[s] > 0)
971            .collect();
972        if used.len() == 1 {
973            Some(used[0])
974        } else {
975            None
976        }
977    };
978
979    // Emit one CLC symbol per literal length (the `0..=15` branch).
980    for &l in lengths {
981        let sym = l as usize;
982        if clc_single.is_some() {
983            continue; // single-leaf CLC: 0 bits per symbol.
984        }
985        let code = clc_codes[sym];
986        let len = clc_lengths[sym] as usize;
987        for i in 0..len {
988            let bit = (code >> (len - 1 - i)) & 1;
989            w.write_bits(bit, 1);
990        }
991    }
992}
993
994/// Smallest backward-reference run (in pixels) the matcher will emit. A
995/// match of fewer than this many pixels rarely pays for the length +
996/// distance prefix codes versus emitting the pixels as literals, so short
997/// runs stay literal.
998pub const MIN_MATCH: usize = 3;
999
1000/// Largest backward-reference run the §5.2.2 length prefix coding admits
1001/// (the spec note: "The maximum backward reference length is limited to
1002/// 4096."). A longer repeat is split into consecutive matches.
1003pub const MAX_MATCH: usize = 4096;
1004
1005/// Number of low bits of the rolling pixel hash → hash-chain head buckets.
1006/// `1 << HASH_BITS` heads; collisions are resolved by walking the chain.
1007const HASH_BITS: usize = 14;
1008/// Cap on chain steps walked per position, bounding the matcher's worst
1009/// case on adversarial inputs while keeping the common-case match quality.
1010const MAX_CHAIN: usize = 64;
1011
1012/// A single emitted token in the §5.2.2 LZ77 stream: either a raw ARGB
1013/// pixel (a §5.2.1 literal), a §5.2.3 color-cache reference, or a
1014/// §5.2.2 backward-reference copy.
1015#[derive(Debug, Clone, Copy, PartialEq, Eq)]
1016enum Token {
1017    /// A §5.2.1 ARGB literal pixel (encoded as four channel symbols).
1018    Literal(u32),
1019    /// A §5.2.3 color-cache reference. `index` is the resolved
1020    /// cache slot (the green symbol on the wire is
1021    /// `256 + 24 + index`).
1022    CacheRef {
1023        /// The hashed cache index (`0..color_cache_size`).
1024        index: u32,
1025    },
1026    /// A §5.2.2 backward reference: copy `length` pixels from `distance`
1027    /// pixels back in scan-line order.
1028    Copy {
1029        /// Copy length in pixels (`MIN_MATCH..=MAX_MATCH`).
1030        length: usize,
1031        /// Scan-line pixel distance back to the copy source (`>= 1`).
1032        distance: usize,
1033    },
1034}
1035
1036/// §5.2.2 hash-chain matcher over a scan-line ARGB pixel buffer.
1037///
1038/// Hashes 4-pixel windows into `1 << HASH_BITS` buckets and chains every
1039/// position sharing a hash, so a match search at position `p` walks only
1040/// positions that begin with the same 4-pixel hash. This is the standard
1041/// LZ77 greedy match structure; it finds repeated pixel runs without ever
1042/// consulting any external implementation — the only correctness contract
1043/// is that an emitted `Copy { length, distance }` is reproducible by the
1044/// decoder's §5.2.2 copy loop, which it is for any `1 <= distance <= p` and
1045/// `length <= remaining`.
1046struct Lz77Matcher<'a> {
1047    pixels: &'a [u32],
1048    head: Vec<i32>,
1049    prev: Vec<i32>,
1050}
1051
1052impl<'a> Lz77Matcher<'a> {
1053    /// Build a matcher over `pixels` with empty hash chains.
1054    fn new(pixels: &'a [u32]) -> Self {
1055        Self {
1056            pixels,
1057            head: vec![-1; 1 << HASH_BITS],
1058            prev: vec![-1; pixels.len()],
1059        }
1060    }
1061
1062    /// Hash the 4-pixel window starting at `pos` (callers guarantee
1063    /// `pos + 4 <= pixels.len()`). A simple multiplicative mix over the
1064    /// four ARGB words, folded into `HASH_BITS` bits.
1065    fn hash(&self, pos: usize) -> usize {
1066        let p = self.pixels;
1067        let mut h = 0u32;
1068        for k in 0..4 {
1069            h = h.wrapping_mul(0x9e37_79b1).wrapping_add(p[pos + k]);
1070        }
1071        (h >> (32 - HASH_BITS)) as usize
1072    }
1073
1074    /// Insert `pos` at the head of its hash bucket's chain.
1075    fn insert(&mut self, pos: usize) {
1076        if pos + 4 > self.pixels.len() {
1077            return;
1078        }
1079        let h = self.hash(pos);
1080        self.prev[pos] = self.head[h];
1081        self.head[h] = pos as i32;
1082    }
1083
1084    /// Find the longest match for the window at `pos`, returning
1085    /// `Some((length, distance))` when a run of `>= MIN_MATCH` pixels is
1086    /// found. Walks at most [`MAX_CHAIN`] chain links.
1087    ///
1088    /// The matcher hashes 4-pixel windows, so a match search requires
1089    /// `pos + 4 <= pixels.len()`. The tail of the image (fewer than 4
1090    /// pixels remaining) is always emitted as literals.
1091    fn find(&self, pos: usize) -> Option<(usize, usize)> {
1092        let p = self.pixels;
1093        let n = p.len();
1094        if pos + 4 > n {
1095            return None;
1096        }
1097        let max_len = (n - pos).min(MAX_MATCH);
1098        let h = self.hash(pos);
1099        let mut cand = self.head[h];
1100        let mut best_len = 0usize;
1101        let mut best_dist = 0usize;
1102        let mut steps = 0usize;
1103        while cand >= 0 && steps < MAX_CHAIN {
1104            let c = cand as usize;
1105            // Candidates were all inserted at positions < pos.
1106            let mut len = 0usize;
1107            while len < max_len && p[c + len] == p[pos + len] {
1108                len += 1;
1109            }
1110            if len > best_len {
1111                best_len = len;
1112                best_dist = pos - c;
1113                if len >= max_len {
1114                    break;
1115                }
1116            }
1117            cand = self.prev[c];
1118            steps += 1;
1119        }
1120        if best_len >= MIN_MATCH {
1121            Some((best_len, best_dist))
1122        } else {
1123            None
1124        }
1125    }
1126}
1127
1128/// Run the §5.2.2 hash-chain matcher over `pixels`, producing the
1129/// token stream (literals + backward-reference copies) the entropy
1130/// stage emits. Every `Copy` token has `1 <= distance <= position` and
1131/// `MIN_MATCH <= length <= MAX_MATCH`, so the decoder's §5.2.2 copy
1132/// loop reproduces the exact pixels.
1133///
1134/// As of round 158 the matcher applies **three-position lazy matching**:
1135/// when the matcher finds a match `(len_a, _)` at `pos`, the encoder
1136/// also probes `pos + 1` (depth-1), `pos + 2` (depth-2), and `pos + 3`
1137/// (depth-3). The longest of `(len_a, len_b, len_c, len_d)` wins; ties
1138/// resolve to the earliest position (preserving the strict-greater
1139/// semantics introduced in round 156). When the depth-3 match `len_d`
1140/// is the unique longest, the encoder emits *three* literals (at
1141/// `pos`, `pos + 1`, `pos + 2`) and takes the longer match starting
1142/// at `pos + 3`. This costs at most three extra hash-chain walks per
1143/// match attempt and extends the round-157 two-position lazy recovery
1144/// to the third-order trap: a short match at each of `pos`, `pos + 1`,
1145/// `pos + 2` together blocking a strictly longer match at `pos + 3`.
1146/// The reconstructed pixels are bit-identical to the strict-greedy,
1147/// depth-1, and depth-2 partitions for any input — only the token
1148/// *partition* shifts by up to three pixels — so round-trips remain
1149/// bit-exact and the existing test suite continues to pass.
1150fn tokenize_lz77(pixels: &[u32]) -> Vec<Token> {
1151    tokenize_lz77_inner(pixels, LAZY_DEPTH_DEFAULT)
1152}
1153
1154/// Production lazy-match depth used by [`tokenize_lz77`]. Round 156
1155/// set this to 1 (single-position look-ahead); round 157 bumped it to
1156/// 2 (two-position look-ahead); round 158 bumped it to 3 (three-
1157/// position look-ahead); round 163 bumps it to 4 (four-position look-
1158/// ahead with a [`DEPTH4_GUARD_THRESHOLD`] diminishing-returns guard).
1159/// A value of 0 reproduces the r155 strict-greedy partition.
1160const LAZY_DEPTH_DEFAULT: u32 = 4;
1161
1162/// Round-163 diminishing-returns guard for the depth-4 probe. The
1163/// depth-4 `find(pos + 4)` call (plus the `matcher.insert(pos + 3)`
1164/// bookkeeping that gives it a fair shot at including `pos..=pos + 3`
1165/// in its window) is only executed when the running best length
1166/// across the depth-1/2/3 probes is strictly less than this
1167/// threshold. Once the depth-3 best already covers a length-
1168/// `THRESHOLD` run, swapping to a depth-4 alternative would have to
1169/// strictly exceed that length while paying for four literals
1170/// (`pixels[pos..pos + 4]`); the empirical pay-off shrinks rapidly
1171/// past the threshold and is rarely big enough to recover the
1172/// literal-emission cost in the entropy stage. Tuned to a conservative
1173/// value (`6`) so the guard only suppresses depth-4 work when the
1174/// running best is already comfortably above the four-literal break-
1175/// even line. At `THRESHOLD = u32::MAX` the depth-4 probe still
1176/// honours the `best_len > MIN_MATCH` floor (see
1177/// [`tokenize_lz77_inner`]); at `THRESHOLD = 0` (or below
1178/// `MIN_MATCH + 1 = 4`) the depth-4 probe never fires. The A/B
1179/// regression test [`round_163_depth4_guard_suppresses_long_run_swap`]
1180/// exercises the guard's switching boundary.
1181const DEPTH4_GUARD_THRESHOLD: u32 = 6;
1182
1183/// Implementation of [`tokenize_lz77`] with an explicit `lazy_depth`
1184/// toggle. Values:
1185///
1186/// * `0` — strict-greedy r155 partition (no look-ahead). Always emits
1187///   the match found at `pos`.
1188/// * `1` — round-156 single-position lazy partition: probe `pos + 1`,
1189///   swap to a strictly-longer match starting there.
1190/// * `2` — round-157 two-position lazy partition: also probe
1191///   `pos + 2`, swap to a strictly-longer match starting there (the
1192///   `pos + 2` match must strictly beat both `pos` and `pos + 1`).
1193/// * `3` — round-158 three-position lazy partition: also probe
1194///   `pos + 3`, swap to a strictly-longer match starting there (the
1195///   `pos + 3` match must strictly beat the running best across
1196///   `pos`, `pos + 1`, and `pos + 2`).
1197/// * `4` — round-163 guarded four-position lazy partition: also
1198///   probes `pos + 4`, but **only when** the running best across the
1199///   first four positions is strictly greater than [`MIN_MATCH`]
1200///   (`MIN_MATCH = 3`, so `best_len >= 4`) AND strictly less than
1201///   [`DEPTH4_GUARD_THRESHOLD`]. The `> MIN_MATCH` floor ensures the
1202///   pre-inserted `pos + 3` position is always covered by the chosen
1203///   match's range, so the next iteration's `find` never sees its
1204///   own position in the chain. When the guard fires, the `pos + 4`
1205///   match must strictly beat the running best.
1206///
1207/// Values `>= 4` are clamped to `4`. The A/B regression tests
1208/// in this module use `0`, `1`, `2`, and `3` to compare against the
1209/// r155, r156, r157, and r158 baselines.
1210fn tokenize_lz77_inner(pixels: &[u32], lazy_depth: u32) -> Vec<Token> {
1211    let n = pixels.len();
1212    let mut matcher = Lz77Matcher::new(pixels);
1213    let mut tokens = Vec::new();
1214    let mut pos = 0usize;
1215    let depth = lazy_depth.min(4);
1216    while pos < n {
1217        if let Some((len_a, dist_a)) = matcher.find(pos) {
1218            // Lazy lookahead. The matcher's hash chains do not yet
1219            // include `pos` (matches at `pos` only reference positions
1220            // strictly before `pos`), so to give the `pos + 1` probe a
1221            // fair shot at a match that *includes* the pixel at `pos`
1222            // we insert `pos` into the chains before the look-ahead
1223            // `find`. Likewise, the `pos + 2` probe needs both `pos`
1224            // and `pos + 1` in the chains, and the `pos + 3` probe
1225            // needs `pos`, `pos + 1`, and `pos + 2` all in. The
1226            // bookkeeping at the tail of each branch skips
1227            // re-inserting any positions that the lookahead probes
1228            // already inserted.
1229            let mut best_len = len_a;
1230            let mut best_dist = dist_a;
1231            let mut best_start = pos; // pixel index where the match begins
1232            let inserted_pos = depth >= 1 && len_a < MAX_MATCH && pos + 1 < n;
1233            if inserted_pos {
1234                matcher.insert(pos);
1235                if let Some((len_b, dist_b)) = matcher.find(pos + 1) {
1236                    if len_b > best_len {
1237                        best_len = len_b;
1238                        best_dist = dist_b;
1239                        best_start = pos + 1;
1240                    }
1241                }
1242            }
1243            // Depth-2 probe: only meaningful if depth allows it, the
1244            // current best match is short enough to be worth
1245            // attempting to displace, and `pos + 2` is in range. We
1246            // also require `pos + 1` to be inserted so the `pos + 2`
1247            // window can reference it; the depth-1 probe already
1248            // inserted `pos`.
1249            let inserted_pos1 = depth >= 2 && best_len < MAX_MATCH && pos + 2 < n;
1250            if inserted_pos1 {
1251                matcher.insert(pos + 1);
1252                if let Some((len_c, dist_c)) = matcher.find(pos + 2) {
1253                    if len_c > best_len {
1254                        best_len = len_c;
1255                        best_dist = dist_c;
1256                        best_start = pos + 2;
1257                    }
1258                }
1259            }
1260            // Depth-3 probe: only meaningful if depth allows it, the
1261            // running best match is short enough to be worth
1262            // attempting to displace, and `pos + 3` is in range. We
1263            // also require `pos + 2` to be inserted so the `pos + 3`
1264            // window can reference it; the depth-1 / depth-2 probes
1265            // already inserted `pos` and `pos + 1`.
1266            let inserted_pos2 = depth >= 3 && best_len < MAX_MATCH && pos + 3 < n;
1267            if inserted_pos2 {
1268                matcher.insert(pos + 2);
1269                if let Some((len_d, dist_d)) = matcher.find(pos + 3) {
1270                    if len_d > best_len {
1271                        best_len = len_d;
1272                        best_dist = dist_d;
1273                        best_start = pos + 3;
1274                    }
1275                }
1276            }
1277            // Depth-4 probe (round 163): only meaningful if depth
1278            // allows it, the running best match is short enough to be
1279            // worth attempting to displace, `pos + 4` is in range,
1280            // AND the round-163 diminishing-returns guard fires
1281            // (`best_len < DEPTH4_GUARD_THRESHOLD`). The guard skips
1282            // the depth-4 work when the depth-3 best is already
1283            // comfortably above the four-literal break-even line.
1284            //
1285            // Additional **lower-bound** floor: the depth-4 probe pre-
1286            // inserts `pos + 3` into the matcher chain so the `find(pos
1287            // + 4)` window can reference it. That pre-insert must be
1288            // covered by the chosen match's range `[best_start,
1289            // best_start + best_len)` — otherwise the next iteration's
1290            // `pos` (= `best_start + best_len`) could equal `pos + 3`,
1291            // and `find(pos + 3)` would see itself in the chain and
1292            // return distance `0`. We avoid that corner by gating on
1293            // `best_len > MIN_MATCH` (i.e., `best_len >= 4`): with
1294            // `best_start == pos` the match end is at least `pos + 4 >
1295            // pos + 3`, covering the pre-insert. The depth-3 best of
1296            // exactly 3 pixels (`= MIN_MATCH`) is short enough that
1297            // the depth-4 probe is rarely worth it anyway, so the
1298            // floor costs almost nothing on the matcher's behaviour.
1299            //
1300            // We also require `pos + 3` to be inserted so the `pos + 4`
1301            // window can reference it; the depth-1 / depth-2 / depth-3
1302            // probes already inserted `pos`, `pos + 1`, and `pos + 2`.
1303            let inserted_pos3 = depth >= 4
1304                && best_len > MIN_MATCH
1305                && best_len < MAX_MATCH
1306                && (best_len as u32) < DEPTH4_GUARD_THRESHOLD
1307                && pos + 4 < n;
1308            if inserted_pos3 {
1309                matcher.insert(pos + 3);
1310                if let Some((len_e, dist_e)) = matcher.find(pos + 4) {
1311                    if len_e > best_len {
1312                        best_len = len_e;
1313                        best_dist = dist_e;
1314                        best_start = pos + 4;
1315                    }
1316                }
1317            }
1318
1319            // Emit literals for any pixels skipped by the chosen
1320            // lazy starting position, then the chosen match.
1321            for &skipped in &pixels[pos..best_start] {
1322                tokens.push(Token::Literal(skipped));
1323            }
1324            tokens.push(Token::Copy {
1325                length: best_len,
1326                distance: best_dist,
1327            });
1328
1329            // Hash-chain bookkeeping. Insert every covered position
1330            // into the chains so later matches can reference inside
1331            // the just-copied run; skip positions that the lookahead
1332            // probes already inserted.
1333            //
1334            // Pre-inserted positions (so far): `pos` if `inserted_pos`,
1335            // `pos + 1` if `inserted_pos1`, `pos + 2` if `inserted_pos2`,
1336            // `pos + 3` if `inserted_pos3` (round 163). The chosen
1337            // match covers `[best_start, best_start + best_len)`. Walk
1338            // that range and only `insert` the positions that are not
1339            // already in the chains.
1340            let end = best_start + best_len;
1341            let mut q = pos;
1342            while q < end {
1343                let already_in = (q == pos && inserted_pos)
1344                    || (q == pos + 1 && inserted_pos1)
1345                    || (q == pos + 2 && inserted_pos2)
1346                    || (q == pos + 3 && inserted_pos3);
1347                if q >= best_start && !already_in {
1348                    matcher.insert(q);
1349                }
1350                q += 1;
1351            }
1352            pos = end;
1353        } else {
1354            tokens.push(Token::Literal(pixels[pos]));
1355            matcher.insert(pos);
1356            pos += 1;
1357        }
1358    }
1359    tokens
1360}
1361
1362/// Allowed range for the §5.2.3 `color_cache_code_bits` field: an
1363/// enabled cache has `code_bits ∈ [1, 11]`, giving a cache size of
1364/// `2..=2048` entries. Mirrors
1365/// [`crate::meta_prefix::COLOR_CACHE_BITS_MIN`] /
1366/// [`crate::meta_prefix::COLOR_CACHE_BITS_MAX`].
1367pub const COLOR_CACHE_BITS_MIN: u32 = 1;
1368/// See [`COLOR_CACHE_BITS_MIN`].
1369pub const COLOR_CACHE_BITS_MAX: u32 = 11;
1370
1371/// The default `color_cache_code_bits` the chooser evaluates when a
1372/// caller asks for a single representative cache size (e.g. test
1373/// fixtures, the `encode_argb_literals_color_cache` direct entry).
1374/// Eight bits gives a 256-entry cache — a middle-of-range value that
1375/// works reasonably well across the §5.2.3 `[1..11]` range.
1376///
1377/// The production chooser ([`encode_argb_literals_with_width`] and
1378/// [`encode_argb_with_predictor_chooser`]) no longer uses this single
1379/// value: as of round 148 it sweeps every `cache_code_bits ∈ [1..11]`
1380/// per the §5.2.3 range and emits the smallest stream. See
1381/// [`select_best_cache_bits`].
1382pub const DEFAULT_COLOR_CACHE_BITS: u32 = 8;
1383
1384/// §5.2.3 color-cache helper used by the encoder. Mirrors the decoder's
1385/// [`crate::vp8l_decode::ColorCache`] semantics: an array of
1386/// `1 << code_bits` ARGB entries, all initialized to zero, with a
1387/// hashed lookup `(0x1e35a7bd * argb) >> (32 - code_bits)`.
1388///
1389/// The encoder maintains the cache in stream order — exactly as the
1390/// decoder will when re-walking the emitted symbols — so a slot's
1391/// state matches between writer and reader at every bit position. A
1392/// §5.2.3 `CacheRef { index }` token is emitted *only* when
1393/// `lookup(index) == Some(argb)` at the moment the token is produced;
1394/// the decoder will read the same index and produce the same ARGB.
1395#[derive(Debug, Clone)]
1396struct EncoderColorCache {
1397    code_bits: u32,
1398    entries: Vec<u32>,
1399}
1400
1401impl EncoderColorCache {
1402    /// Allocate a fresh `1 << code_bits`-entry cache. `code_bits` must
1403    /// be in `[COLOR_CACHE_BITS_MIN, COLOR_CACHE_BITS_MAX]`; debug
1404    /// builds assert.
1405    fn new(code_bits: u32) -> Self {
1406        debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&code_bits));
1407        Self {
1408            code_bits,
1409            entries: vec![0u32; 1usize << code_bits],
1410        }
1411    }
1412
1413    /// `1 << code_bits` — the §5.2.3 cache size.
1414    #[cfg(test)]
1415    fn size(&self) -> usize {
1416        self.entries.len()
1417    }
1418
1419    /// §5.2.3: `(0x1e35a7bd * argb) >> (32 - code_bits)`. Identical to
1420    /// the decoder's [`crate::vp8l_decode::ColorCache::hash`].
1421    fn hash(&self, argb: u32) -> usize {
1422        (crate::vp8l_decode::COLOR_CACHE_HASH_MULTIPLIER.wrapping_mul(argb)
1423            >> (32 - self.code_bits)) as usize
1424    }
1425
1426    /// `true` when the slot for `argb`'s hash currently holds `argb`
1427    /// itself — i.e. emitting a `CacheRef { index: hash(argb) }`
1428    /// token would round-trip to the same pixel on decode.
1429    fn contains(&self, argb: u32) -> Option<usize> {
1430        let idx = self.hash(argb);
1431        if self.entries[idx] == argb {
1432            Some(idx)
1433        } else {
1434            None
1435        }
1436    }
1437
1438    /// Insert `argb` at its hashed slot (§5.2.3: every emitted pixel,
1439    /// literal or covered by a backward reference, is re-inserted).
1440    fn insert(&mut self, argb: u32) {
1441        let idx = self.hash(argb);
1442        self.entries[idx] = argb;
1443    }
1444}
1445
1446/// Second-pass §5.2.3 cache-aware token rewrite.
1447///
1448/// Walks `tokens` in stream order, maintaining the cache exactly as
1449/// the decoder will. When a `Literal(argb)` matches the cache's
1450/// current slot for `argb`, the literal is rewritten to a
1451/// `CacheRef { index }` token so the decoder can re-read it from the
1452/// cache. Backward-reference copies pass through unchanged; the
1453/// covered pixels are inserted into the cache (spec §5.2.3) so later
1454/// repeats can refer back to them via cache codes.
1455///
1456/// `pixels` provides the underlying pixel sequence for backward
1457/// references (needed to know which colors a `Copy` token covers so
1458/// the cache state stays in sync).
1459fn cacheify_tokens(tokens: &[Token], pixels: &[u32], code_bits: u32) -> Vec<Token> {
1460    let mut cache = EncoderColorCache::new(code_bits);
1461    let mut out = Vec::with_capacity(tokens.len());
1462    let mut pos = 0usize;
1463    for &tok in tokens {
1464        match tok {
1465            Token::Literal(argb) => {
1466                if let Some(idx) = cache.contains(argb) {
1467                    out.push(Token::CacheRef { index: idx as u32 });
1468                } else {
1469                    out.push(Token::Literal(argb));
1470                }
1471                cache.insert(argb);
1472                pos += 1;
1473            }
1474            Token::CacheRef { .. } => {
1475                // Caller should not pre-emit cache refs into the
1476                // input stream; keep tokens we don't recognise as
1477                // literals from the matcher's output verbatim.
1478                out.push(tok);
1479                pos += 1;
1480            }
1481            Token::Copy { length, distance } => {
1482                out.push(tok);
1483                // Mirror the decoder's §5.2.3 invariant: every pixel
1484                // covered by a backward-reference copy is inserted in
1485                // stream order. The source pixels live at
1486                // `pos - distance .. pos - distance + length` in
1487                // `pixels`; the destination at `pos .. pos + length`
1488                // would be identical (copies always reproduce source
1489                // bytes), so we read directly off the source slice.
1490                let src_start = pos - distance;
1491                for i in 0..length {
1492                    let argb = pixels[src_start + i];
1493                    cache.insert(argb);
1494                }
1495                pos += length;
1496            }
1497        }
1498    }
1499    debug_assert_eq!(
1500        pos,
1501        pixels.len(),
1502        "cacheify_tokens: token stream covered {pos} of {} pixels",
1503        pixels.len()
1504    );
1505    out
1506}
1507
1508/// The five per-symbol frequency tables for one prefix-code group: green
1509/// (literals + §5.2.2 length symbols + §5.2.3 cache indices), red, blue,
1510/// alpha, and distance.
1511struct Frequencies {
1512    green: Vec<u32>,
1513    red: Vec<u32>,
1514    blue: Vec<u32>,
1515    alpha: Vec<u32>,
1516    distance: Vec<u32>,
1517}
1518
1519/// Legacy §5.2.2 *scan-line* distance encoding (`distance_code = D + 120`).
1520///
1521/// The decoder's [`crate::vp8l_decode::distance_code_to_pixel_distance`]
1522/// maps any `distance_code > 120` straight back to `distance_code - 120 == D`,
1523/// so this is always a valid round-trip. Retained as the unit-test reference
1524/// (so the round-130 chooser can be measured against the round-119 baseline)
1525/// — production paths use [`pixel_distance_to_distance_code`], which picks
1526/// the smaller of the scan-line code and any matching distance-map code.
1527#[cfg(test)]
1528fn distance_to_code(distance: usize) -> u32 {
1529    distance as u32 + crate::vp8l_decode::NUM_DISTANCE_MAP_CODES as u32
1530}
1531
1532/// §5.2.2 distance-code chooser: pick the smaller of the scan-line code
1533/// (`D + 120`) and any §5.2.2 distance-map code `c ∈ 1..=120` that
1534/// reconstructs `D` for the given `image_width`.
1535///
1536/// A distance-map entry `(xi, yi)` at index `c-1` reconstructs to
1537/// `max(xi + yi * image_width, 1)` per the decoder's
1538/// [`crate::vp8l_decode::distance_code_to_pixel_distance`]. The chooser
1539/// returns the **smallest** raw code that reconstructs to `distance` —
1540/// smaller raw codes feed [`value_to_prefix`] through low-prefix slots
1541/// (codes `1..=4` use 0 extra bits; code `5` uses 1 extra bit; …), which
1542/// then enter the distance prefix-code's Huffman tree with the highest
1543/// frequencies and the shortest emitted lengths.
1544///
1545/// # Smallest-code early-out
1546///
1547/// Map codes occupy `1..=120` and the scan-line fallback is
1548/// `distance + 120 >= 121`, so **any** matching map entry is strictly
1549/// smaller than the fallback. Because the entries are visited in
1550/// ascending code order (`idx + 1`), the *first* entry whose
1551/// reconstruction equals `distance` is, by construction, the smallest
1552/// valid code — no later entry (higher code) and not the fallback can
1553/// beat it. The scan therefore returns on the first match instead of
1554/// continuing through all 120 entries. When no entry matches it falls
1555/// through to the scan-line code. This preserves the exact same chosen
1556/// code as a full scan with a smallest-code tie-break, so the emitted
1557/// bytes are unchanged.
1558///
1559/// The reconstruction is identical to the legacy scan-line form, so the
1560/// decoder produces the exact same pixel distance and the round-trip
1561/// stays bit-exact.
1562///
1563/// Panics in debug builds when `distance == 0` (callers guarantee
1564/// `1 <= distance <= position` per §5.2.2's backward-reference invariant).
1565pub fn pixel_distance_to_distance_code(distance: usize, image_width: u32) -> u32 {
1566    debug_assert!(distance >= 1, "§5.2.2 distance must be >= 1");
1567    let width_i32 = image_width as i32;
1568    for (idx, &(xi, yi)) in crate::vp8l_decode::DISTANCE_MAP.iter().enumerate() {
1569        // The decoder computes `xi + yi * W` and clamps to 1. Match the
1570        // exact reconstruction so we never emit a code that would resolve
1571        // to a different distance.
1572        let raw = xi + yi * width_i32;
1573        let mapped = if raw < 1 { 1 } else { raw as usize };
1574        if mapped == distance {
1575            // First match is the smallest code (entries are in ascending
1576            // code order) and always < the scan-line fallback (>= 121),
1577            // so return immediately.
1578            return (idx + 1) as u32;
1579        }
1580    }
1581    distance as u32 + crate::vp8l_decode::NUM_DISTANCE_MAP_CODES as u32
1582}
1583
1584/// Accumulate the per-symbol frequencies for a token stream so the entropy
1585/// stage can build length-optimal prefix codes before emitting.
1586///
1587/// `color_cache_size` is `1 << color_cache_code_bits` (0 when the cache
1588/// is disabled). It extends the GREEN alphabet to
1589/// `256 + 24 + color_cache_size` per §6.2.3 so a `CacheRef { index }`
1590/// token's wire symbol `256 + 24 + index` is in range.
1591///
1592/// `image_width` is needed to feed [`pixel_distance_to_distance_code`] so
1593/// the frequency table matches the prefix codes the emit loop will choose
1594/// for each backward reference. Passing `1` (the legacy width-less form)
1595/// disables the §5.2.2 distance-map optimisation — only codes 1..=8 can
1596/// possibly match at width 1, so all row-style matches fall back to the
1597/// scan-line `D + 120` form.
1598fn count_frequencies(tokens: &[Token], color_cache_size: usize, image_width: u32) -> Frequencies {
1599    let green_alphabet = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + color_cache_size;
1600    let mut freqs = Frequencies {
1601        green: vec![0u32; green_alphabet],
1602        red: vec![0u32; 256],
1603        blue: vec![0u32; 256],
1604        alpha: vec![0u32; 256],
1605        distance: vec![0u32; 40],
1606    };
1607    for &tok in tokens {
1608        match tok {
1609            Token::Literal(p) => {
1610                let a = ((p >> 24) & 0xff) as usize;
1611                let r = ((p >> 16) & 0xff) as usize;
1612                let g = ((p >> 8) & 0xff) as usize;
1613                let b = (p & 0xff) as usize;
1614                freqs.green[g] += 1;
1615                freqs.red[r] += 1;
1616                freqs.blue[b] += 1;
1617                freqs.alpha[a] += 1;
1618            }
1619            Token::CacheRef { index } => {
1620                // §5.2.3: GREEN symbol is `256 + 24 + index`.
1621                let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + index as usize;
1622                debug_assert!(sym < green_alphabet);
1623                freqs.green[sym] += 1;
1624            }
1625            Token::Copy { length, distance } => {
1626                // §5.2.2: length is a GREEN symbol `256 + length_prefix`.
1627                let (len_prefix, _, _) = value_to_prefix(length as u32);
1628                freqs.green[256 + len_prefix as usize] += 1;
1629                // Distance prefix code (#5). Width-aware chooser picks the
1630                // smaller of scan-line `D + 120` and any §5.2.2 distance-map
1631                // code reconstructing to `D` for `image_width`.
1632                let raw_code = pixel_distance_to_distance_code(distance, image_width);
1633                let (dist_prefix, _, _) = value_to_prefix(raw_code);
1634                freqs.distance[dist_prefix as usize] += 1;
1635            }
1636        }
1637    }
1638    freqs
1639}
1640
1641/// Emit a length/distance `value` to `w`: the entropy-coded prefix symbol
1642/// via `code`, then its `extra_bits` raw bits LSB-first (matching the
1643/// decoder's `ReadBits`). `symbol_base` is added to the prefix code before
1644/// the entropy lookup (256 for GREEN length symbols, 0 for distances).
1645fn write_lz77_value(w: &mut BitWriter, code: &WriteCode, symbol_base: usize, value: u32) {
1646    let (prefix, extra_bits, extra_value) = value_to_prefix(value);
1647    code.write_symbol(w, symbol_base + prefix as usize);
1648    if extra_bits > 0 {
1649        w.write_bits(extra_value, extra_bits as usize);
1650    }
1651}
1652
1653/// §3.5.3 / §3.8.2 *forward* subtract-green transform: subtract the green
1654/// channel from red and blue per pixel, in place. The exact inverse of
1655/// [`crate::vp8l_transform::inverse_subtract_green`], so re-applying the
1656/// decoder's inverse pass after entropy decode restores the original
1657/// pixels byte-for-byte.
1658///
1659/// Spec arithmetic: `red := (red - green) & 0xff`,
1660/// `blue := (blue - green) & 0xff` (the §3.5.3 inverse is `+ green & 0xff`,
1661/// so subtracting on the encode side and adding back on the decode side is
1662/// a perfect round trip modulo 256).
1663pub fn apply_subtract_green(pixels: &mut [u32]) {
1664    for px in pixels.iter_mut() {
1665        let a = (*px >> 24) & 0xff;
1666        let r = (*px >> 16) & 0xff;
1667        let g = (*px >> 8) & 0xff;
1668        let b = *px & 0xff;
1669        let r_new = r.wrapping_sub(g) & 0xff;
1670        let b_new = b.wrapping_sub(g) & 0xff;
1671        *px = (a << 24) | (r_new << 16) | (g << 8) | b_new;
1672    }
1673}
1674
1675// ---- §4.1 spatial-predictor forward transform (encoder side) ----
1676
1677/// `DIV_ROUND_UP(num, den)` from §4.1 (`((num) + (den) - 1) / (den)`).
1678#[inline]
1679fn predictor_div_round_up(num: u32, den: u32) -> u32 {
1680    num.div_ceil(den)
1681}
1682
1683/// Per-channel `(a + b) / 2` (`Average2` from §4.1).
1684#[inline]
1685fn predictor_average2(a: u32, b: u32) -> u32 {
1686    let f = |sh: u32| -> u32 {
1687        let ca = (a >> sh) & 0xff;
1688        let cb = (b >> sh) & 0xff;
1689        (ca + cb) / 2
1690    };
1691    (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1692}
1693
1694/// `Clamp(a)` from §4.1: saturate `a` to `[0, 255]`.
1695#[inline]
1696fn predictor_clamp(a: i32) -> i32 {
1697    a.clamp(0, 255)
1698}
1699
1700/// §4.1 `ClampAddSubtractFull(a, b, c)` = `Clamp(a + b - c)` per channel.
1701#[inline]
1702fn predictor_clamp_add_subtract_full(a: u32, b: u32, c: u32) -> u32 {
1703    let f = |sh: u32| -> u32 {
1704        let ca = ((a >> sh) & 0xff) as i32;
1705        let cb = ((b >> sh) & 0xff) as i32;
1706        let cc = ((c >> sh) & 0xff) as i32;
1707        predictor_clamp(ca + cb - cc) as u32
1708    };
1709    (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1710}
1711
1712/// §4.1 `ClampAddSubtractHalf(a, b)` = `Clamp(a + (a - b) / 2)` per
1713/// channel.
1714#[inline]
1715fn predictor_clamp_add_subtract_half(a: u32, b: u32) -> u32 {
1716    let f = |sh: u32| -> u32 {
1717        let ca = ((a >> sh) & 0xff) as i32;
1718        let cb = ((b >> sh) & 0xff) as i32;
1719        predictor_clamp(ca + (ca - cb) / 2) as u32
1720    };
1721    (f(24) << 24) | (f(16) << 16) | (f(8) << 8) | f(0)
1722}
1723
1724/// §4.1 `Select(L, T, TL)` — whichever of `L` / `T` is closer
1725/// (per-channel Manhattan distance) to the `L + T - TL` estimate.
1726#[inline]
1727fn predictor_select(l: u32, t: u32, tl: u32) -> u32 {
1728    let ach = |x: u32| ((x >> 24) & 0xff) as i32;
1729    let rch = |x: u32| ((x >> 16) & 0xff) as i32;
1730    let gch = |x: u32| ((x >> 8) & 0xff) as i32;
1731    let bch = |x: u32| (x & 0xff) as i32;
1732
1733    let p_a = ach(l) + ach(t) - ach(tl);
1734    let p_r = rch(l) + rch(t) - rch(tl);
1735    let p_g = gch(l) + gch(t) - gch(tl);
1736    let p_b = bch(l) + bch(t) - bch(tl);
1737
1738    let p_l =
1739        (p_a - ach(l)).abs() + (p_r - rch(l)).abs() + (p_g - gch(l)).abs() + (p_b - bch(l)).abs();
1740    let p_t =
1741        (p_a - ach(t)).abs() + (p_r - rch(t)).abs() + (p_g - gch(t)).abs() + (p_b - bch(t)).abs();
1742
1743    if p_l < p_t {
1744        l
1745    } else {
1746        t
1747    }
1748}
1749
1750/// Compute the §4.1 prediction for `mode ∈ 0..=13` given the four
1751/// reconstructed-pixel neighbours.
1752///
1753/// Identical formula to the decoder's
1754/// `crate::vp8l_transform::inverse_predictor` `predict` helper — kept
1755/// as a separate copy here because the encoder is built (and tested)
1756/// independently of the decoder's transform module.
1757fn predictor_predict(mode: u8, l: u32, t: u32, tr: u32, tl: u32) -> u32 {
1758    match mode {
1759        0 => 0xff00_0000,
1760        1 => l,
1761        2 => t,
1762        3 => tr,
1763        4 => tl,
1764        5 => predictor_average2(predictor_average2(l, tr), t),
1765        6 => predictor_average2(l, tl),
1766        7 => predictor_average2(l, t),
1767        8 => predictor_average2(tl, t),
1768        9 => predictor_average2(t, tr),
1769        10 => predictor_average2(predictor_average2(l, tl), predictor_average2(t, tr)),
1770        11 => predictor_select(l, t, tl),
1771        12 => predictor_clamp_add_subtract_full(l, t, tl),
1772        13 => predictor_clamp_add_subtract_half(predictor_average2(l, t), tl),
1773        // §4.1 only defines [0..13]. An out-of-range mode produces the
1774        // top-left's solid-black prediction, matching the decoder.
1775        _ => 0xff00_0000,
1776    }
1777}
1778
1779/// Per-channel residual `(original - pred) mod 256`. The inverse of
1780/// the decoder's `add_pred` (`residual + pred mod 256 = original`),
1781/// so re-applying the §4.1 inverse predictor recovers `original`
1782/// exactly.
1783///
1784/// **Round-224 SWAR experiment — closure-of-four body retained.**
1785/// The decoder-side `add_pred` was rewritten in round 170 as a
1786/// two-pair SWAR (`(x & 0x00ff_00ff).wrapping_add(...)` /
1787/// `(x & 0xff00_ff00).wrapping_add(...)`) because addition does not
1788/// propagate carry across the zero "guard" bytes when the summand has
1789/// its high bit masked out. Subtraction is asymmetric: a borrow at the
1790/// low byte of a lane DOES propagate through the zero guard byte and
1791/// corrupts the adjacent lane, so the mirror rewrite needs to bias
1792/// the minuend with a `0x0100` guard per lane (`(orig & 0x00ff_00ff)
1793/// | 0x0100_0100`) to suppress underflow before the subtract, with a
1794/// final `& 0x00ff_00ff` mask to clear the guard. We measured both
1795/// forms in round 224 against the new `predictor_subtract_256x256`
1796/// bench: **34.1 µs (closure-of-four) → 40.5 µs (biased SWAR), a
1797/// +18.4% regression.** AArch64 NEON auto-vectorisation of the four
1798/// sequential per-byte `wrapping_sub` calls is tighter than the
1799/// explicit biased-SWAR pattern at this call site. Same shape as the
1800/// round-194 BENCHMARKS footnote that recorded a regression for a
1801/// `clamp_add_subtract_*` (mode 12) per-channel `to_le_bytes()` +
1802/// `i16` byte-loop attempt — the closure-of-four `i32` body remains
1803/// the right starting point on this target until a true 16-byte
1804/// `std::simd` formulation can amortise the lane-bias cost across
1805/// multiple pixels per iteration (mirroring the `to_rgba_simd`
1806/// precedent under the `simd` feature).
1807#[inline]
1808pub fn predictor_subtract(original: u32, pred: u32) -> u32 {
1809    let a = ((original >> 24) & 0xff).wrapping_sub((pred >> 24) & 0xff) & 0xff;
1810    let r = ((original >> 16) & 0xff).wrapping_sub((pred >> 16) & 0xff) & 0xff;
1811    let g = ((original >> 8) & 0xff).wrapping_sub((pred >> 8) & 0xff) & 0xff;
1812    let b = (original & 0xff).wrapping_sub(pred & 0xff) & 0xff;
1813    (a << 24) | (r << 16) | (g << 8) | b
1814}
1815
1816/// Cost proxy used to pick a block's predictor mode: the sum of
1817/// per-pixel per-channel `|residual|` over the block, where `|x|`
1818/// folds the mod-256 residual onto `[-128, 127]` (a value `x ∈ [0,
1819/// 255]` representing `(original - pred) mod 256` has true magnitude
1820/// `min(x, 256 - x)`).
1821///
1822/// Sum-of-magnitudes is a standard zero-cost proxy for the entropy
1823/// of the residual histogram: lower magnitudes peak the histogram
1824/// near zero, which a Huffman code over the residual symbols
1825/// compresses well. Using the folded magnitude correctly rewards
1826/// modes that produce both small-positive and small-negative
1827/// residuals (e.g. `0xff` = `-1 mod 256`, magnitude 1).
1828#[inline]
1829fn residual_magnitude(residual: u32) -> u32 {
1830    let fold = |v: u32| -> u32 {
1831        let v = v & 0xff;
1832        if v <= 128 {
1833            v
1834        } else {
1835            256 - v
1836        }
1837    };
1838    fold(residual >> 24) + fold(residual >> 16) + fold(residual >> 8) + fold(residual)
1839}
1840
1841/// §4.1 border-aware prediction at `(x, y)`. Mirrors
1842/// [`crate::vp8l_transform::inverse_predictor`]: top-left is solid
1843/// black `0xff000000`; top row predicts L; left column predicts T;
1844/// rightmost column uses the row's leftmost pixel as TR; otherwise
1845/// `predictor_predict(mode, L, T, TR, TL)`.
1846///
1847/// `pixels` is the `width × height` ARGB source (read-only — the
1848/// encoder predicts against the *originals*, since the decoder
1849/// reconstructs pixels equal to those originals).
1850fn predictor_at(pixels: &[u32], width: usize, x: usize, y: usize, mode: u8) -> u32 {
1851    if x == 0 && y == 0 {
1852        return 0xff00_0000;
1853    }
1854    let idx = y * width + x;
1855    if y == 0 {
1856        return pixels[idx - 1];
1857    }
1858    if x == 0 {
1859        return pixels[idx - width];
1860    }
1861    let l = pixels[idx - 1];
1862    let t = pixels[idx - width];
1863    let tl = pixels[idx - width - 1];
1864    let tr = if x == width - 1 {
1865        pixels[idx - width - (width - 1)]
1866    } else {
1867        pixels[idx - width + 1]
1868    };
1869    predictor_predict(mode, l, t, tr, tl)
1870}
1871
1872/// Per-pixel residual consumer for [`for_each_block_residual`].
1873///
1874/// `pixel` receives each in-bounds block pixel's §4.1 mod-256
1875/// residual in raster order; `row_end` runs after the last pixel of
1876/// each block row and returns whether the walk should continue.
1877///
1878/// Pruning at row granularity (instead of per pixel, as the
1879/// pre-round-280 chooser loops did) is pick-identical: a pruned
1880/// walk's partial cost is only ever compared `>= cap` by the caller,
1881/// and per-pixel contributions are non-negative, so any partial sum
1882/// that prunes implies the full sum would also have compared
1883/// `>= cap`. Coarsening the prune lets the interior pixel loop run
1884/// branch-free (auto-vectorisable) at the cost of at most one block
1885/// row of extra work on a pruned mode.
1886trait ResidualSink {
1887    fn pixel(&mut self, residual: u32);
1888    fn row_end(&mut self) -> bool;
1889}
1890
1891/// Walk every in-bounds pixel of the block `[x0, x0+bw) × [y0,
1892/// y0+bh)` of the `width × height` image in raster order, feeding
1893/// each pixel's §4.1 residual (`predictor_subtract(original,
1894/// prediction)`) to `sink`. Interior predictions come from
1895/// `predict(l, t, tr, tl)`; border pixels follow the §4.1 border
1896/// rules (top-left → solid black, top row → L, left column → T,
1897/// rightmost column → the §4.1 TR wraparound) — pixel-for-pixel
1898/// identical to a [`predictor_at`] + [`predictor_subtract`] walk,
1899/// but with the border branch chain hoisted out of the inner loop
1900/// (round-180 decoder precedent) and the predictor monomorphised in
1901/// by the caller so the per-pixel 14-way mode dispatch disappears.
1902#[inline]
1903#[allow(clippy::too_many_arguments)]
1904fn walk_block_residuals<P, S>(
1905    pixels: &[u32],
1906    width: usize,
1907    height: usize,
1908    x0: usize,
1909    y0: usize,
1910    bw: usize,
1911    bh: usize,
1912    predict: P,
1913    sink: &mut S,
1914) where
1915    P: Fn(u32, u32, u32, u32) -> u32,
1916    S: ResidualSink,
1917{
1918    let y_end = (y0 + bh).min(height);
1919    let x_end = (x0 + bw).min(width);
1920    if x0 >= x_end || y0 >= y_end {
1921        return;
1922    }
1923    let mut y = y0;
1924    if y == 0 {
1925        // Top row: (0, 0) predicts solid black, the rest predict L.
1926        let mut x = x0;
1927        if x == 0 {
1928            sink.pixel(predictor_subtract(pixels[0], 0xff00_0000));
1929            x = 1;
1930        }
1931        while x < x_end {
1932            sink.pixel(predictor_subtract(pixels[x], pixels[x - 1]));
1933            x += 1;
1934        }
1935        if !sink.row_end() {
1936            return;
1937        }
1938        y = 1;
1939    }
1940    // The §4.1 right-column TR wraparound only applies when the block
1941    // reaches the image's right edge.
1942    let interior_end = if x_end == width { width - 1 } else { x_end };
1943    while y < y_end {
1944        let row = y * width;
1945        let mut x = x0;
1946        if x == 0 {
1947            // Left column predicts T.
1948            sink.pixel(predictor_subtract(pixels[row], pixels[row - width]));
1949            x = 1;
1950        }
1951        while x < interior_end {
1952            let idx = row + x;
1953            let l = pixels[idx - 1];
1954            let t = pixels[idx - width];
1955            let tl = pixels[idx - width - 1];
1956            let tr = pixels[idx - width + 1];
1957            sink.pixel(predictor_subtract(pixels[idx], predict(l, t, tr, tl)));
1958            x += 1;
1959        }
1960        if x < x_end {
1961            // x == width - 1: §4.1 TR wraparound.
1962            let idx = row + x;
1963            let l = pixels[idx - 1];
1964            let t = pixels[idx - width];
1965            let tl = pixels[idx - width - 1];
1966            let tr = pixels[idx - width - (width - 1)];
1967            sink.pixel(predictor_subtract(pixels[idx], predict(l, t, tr, tl)));
1968        }
1969        if !sink.row_end() {
1970            return;
1971        }
1972        y += 1;
1973    }
1974}
1975
1976/// Run [`walk_block_residuals`] with the §4.1 predictor for `mode`
1977/// monomorphised into the walk, so the mode dispatch runs once per
1978/// block instead of once per pixel. Out-of-range modes predict solid
1979/// black, matching [`predictor_predict`].
1980#[inline]
1981#[allow(clippy::too_many_arguments)]
1982fn for_each_block_residual<S: ResidualSink>(
1983    pixels: &[u32],
1984    width: usize,
1985    height: usize,
1986    x0: usize,
1987    y0: usize,
1988    bw: usize,
1989    bh: usize,
1990    mode: u8,
1991    sink: &mut S,
1992) {
1993    macro_rules! walk {
1994        ($p:expr) => {
1995            walk_block_residuals(pixels, width, height, x0, y0, bw, bh, $p, sink)
1996        };
1997    }
1998    match mode {
1999        1 => walk!(|l, _, _, _| l),
2000        2 => walk!(|_, t, _, _| t),
2001        3 => walk!(|_, _, tr, _| tr),
2002        4 => walk!(|_, _, _, tl| tl),
2003        5 => walk!(|l, t, tr, _| predictor_average2(predictor_average2(l, tr), t)),
2004        6 => walk!(|l, _, _, tl| predictor_average2(l, tl)),
2005        7 => walk!(|l, t, _, _| predictor_average2(l, t)),
2006        8 => walk!(|_, t, _, tl| predictor_average2(tl, t)),
2007        9 => walk!(|_, t, tr, _| predictor_average2(t, tr)),
2008        10 => walk!(|l, t, tr, tl| predictor_average2(
2009            predictor_average2(l, tl),
2010            predictor_average2(t, tr)
2011        )),
2012        11 => walk!(|l, t, _, tl| predictor_select(l, t, tl)),
2013        12 => walk!(|l, t, _, tl| predictor_clamp_add_subtract_full(l, t, tl)),
2014        13 => walk!(|l, t, _, tl| predictor_clamp_add_subtract_half(predictor_average2(l, t), tl)),
2015        // Mode 0 and §4.1-undefined modes both predict solid black.
2016        _ => walk!(|_, _, _, _| 0xff00_0000),
2017    }
2018}
2019
2020/// [`ResidualSink`] accumulating the folded-L1 [`residual_magnitude`]
2021/// cost proxy, pruning at row granularity once the running sum
2022/// reaches `cap` (see the trait docs for why row-granular pruning is
2023/// pick-identical to the pre-round-280 per-pixel early-out).
2024struct MagnitudeCostSink {
2025    cost: u64,
2026    cap: u64,
2027}
2028
2029impl ResidualSink for MagnitudeCostSink {
2030    #[inline]
2031    fn pixel(&mut self, residual: u32) {
2032        self.cost += residual_magnitude(residual) as u64;
2033    }
2034    #[inline]
2035    fn row_end(&mut self) -> bool {
2036        self.cost < self.cap
2037    }
2038}
2039
2040/// [`ResidualSink`] filling the per-channel residual byte histograms
2041/// [`block_mode_entropy_cost`] feeds its Shannon sum. Never prunes:
2042/// the histograms must be complete before the entropy is meaningful.
2043struct ResidualHistogramSink {
2044    hist: [[u32; 256]; 4],
2045    n: u32,
2046}
2047
2048impl ResidualSink for ResidualHistogramSink {
2049    #[inline]
2050    fn pixel(&mut self, residual: u32) {
2051        self.hist[0][((residual >> 24) & 0xff) as usize] += 1;
2052        self.hist[1][((residual >> 16) & 0xff) as usize] += 1;
2053        self.hist[2][((residual >> 8) & 0xff) as usize] += 1;
2054        self.hist[3][(residual & 0xff) as usize] += 1;
2055        self.n += 1;
2056    }
2057    #[inline]
2058    fn row_end(&mut self) -> bool {
2059        true
2060    }
2061}
2062
2063/// Pick the §4.1 mode `0..=13` that minimises the residual cost
2064/// proxy over the rectangular block `[x0, x0+bw) × [y0, y0+bh)` of
2065/// the `width × height` image. Border rules per
2066/// [`predictor_at`].
2067///
2068/// On ties (multiple modes producing equal magnitude sums) the
2069/// lowest mode wins, which makes the chooser deterministic.
2070///
2071/// This is the no-hint entry point — equivalent to calling
2072/// [`pick_block_mode_with_hint`] with `prefer_mode = None`. The
2073/// production caller [`build_predictor_image`] uses the
2074/// hint-aware variant; the no-hint form is retained for the
2075/// in-module tie-breaker tests.
2076#[cfg(test)]
2077fn pick_block_mode(
2078    pixels: &[u32],
2079    width: usize,
2080    height: usize,
2081    x0: usize,
2082    y0: usize,
2083    bw: usize,
2084    bh: usize,
2085) -> u8 {
2086    pick_block_mode_with_hint(pixels, width, height, x0, y0, bw, bh, None)
2087}
2088
2089/// Compute the §4.1 residual-cost proxy for a single mode over
2090/// the rectangular block `[x0, x0+bw) × [y0, y0+bh)`. Walks every
2091/// in-bounds pixel without an early-out so the caller can use the
2092/// result as an authoritative tie-break reference.
2093///
2094/// This is the same per-mode sum the main chooser computes inside
2095/// [`pick_block_mode_with_hint`], factored out so the entropy-
2096/// image-aware tie-breaker can evaluate the preferred neighbour
2097/// mode exactly once and re-use the value to decide whether a
2098/// post-walk swap is allowed.
2099#[allow(clippy::too_many_arguments)]
2100fn block_mode_cost(
2101    pixels: &[u32],
2102    width: usize,
2103    height: usize,
2104    x0: usize,
2105    y0: usize,
2106    bw: usize,
2107    bh: usize,
2108    mode: u8,
2109) -> u64 {
2110    block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, u64::MAX)
2111}
2112
2113/// [`block_mode_cost`] with a pruning `cap`: once the running cost
2114/// reaches `cap` at a block-row boundary the walk stops and the
2115/// partial sum is returned. Callers only compare a pruned return
2116/// value `>= cap` (the residual magnitudes are non-negative, so a
2117/// partial sum at or above `cap` proves the full sum is too), which
2118/// keeps mode picks identical to an uncapped walk.
2119#[allow(clippy::too_many_arguments)]
2120fn block_mode_cost_capped(
2121    pixels: &[u32],
2122    width: usize,
2123    height: usize,
2124    x0: usize,
2125    y0: usize,
2126    bw: usize,
2127    bh: usize,
2128    mode: u8,
2129    cap: u64,
2130) -> u64 {
2131    let mut sink = MagnitudeCostSink { cost: 0, cap };
2132    for_each_block_residual(pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
2133    sink.cost
2134}
2135
2136/// Hint-aware variant of [`pick_block_mode`]: picks the §4.1 mode
2137/// minimising the residual cost proxy, and on ties prefers
2138/// `prefer_mode` over the otherwise-lowest tied mode.
2139///
2140/// `prefer_mode = Some(m)` directs the tie-break: when `m`'s cost
2141/// equals the lowest cost found across all 14 modes, the chooser
2142/// returns `m` instead of the lowest-indexed tied mode. When
2143/// `prefer_mode = None` (or `prefer_mode = Some(m)` with `m`
2144/// strictly worse than another mode), the lowest-tied-mode behaviour
2145/// is preserved exactly.
2146///
2147/// Round 159: [`build_predictor_image`] passes the left neighbour
2148/// block's chosen mode (or the top neighbour at the left edge of
2149/// the predictor image) as the hint. The §3.5 RFC 9649 note
2150/// "transform data can be decided based on entropy minimization"
2151/// motivates this: residual-cost-equal modes encode different
2152/// values into the predictor sub-image, and the sub-image is
2153/// written as an `entropy-coded-image` (§7.2) so reducing its
2154/// symbol entropy directly shrinks the output stream. The
2155/// residuals themselves do not change (this is a strict tie-break),
2156/// so decode round-trips are unaffected.
2157#[allow(clippy::too_many_arguments)]
2158fn pick_block_mode_with_hint(
2159    pixels: &[u32],
2160    width: usize,
2161    height: usize,
2162    x0: usize,
2163    y0: usize,
2164    bw: usize,
2165    bh: usize,
2166    prefer_mode: Option<u8>,
2167) -> u8 {
2168    let mut best_mode: u8 = 0;
2169    let mut best_cost = u64::MAX;
2170    for mode in 0u8..=13 {
2171        // The cap prunes modes already worse than the current best at
2172        // block-row granularity; a pruned partial sum is `>= best_cost`
2173        // so the `cost < best_cost` update below stays pick-identical
2174        // to a full walk.
2175        let cost = block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, best_cost);
2176        if cost < best_cost {
2177            best_cost = cost;
2178            best_mode = mode;
2179        }
2180    }
2181    // Round 159 entropy-image-aware tie-breaker. If the caller
2182    // supplied a preferred mode (typically the left or top neighbour
2183    // block's chosen mode) and the preferred mode's full cost ties
2184    // with `best_cost`, swap to the preferred mode so the predictor
2185    // sub-image carries a longer run of identical mode values. The
2186    // residual stream produced by the main image's forward transform
2187    // is unchanged (the cost is equal), so decode round-trips are
2188    // bit-identical.
2189    if let Some(m) = prefer_mode {
2190        if m != best_mode {
2191            let cost = block_mode_cost(pixels, width, height, x0, y0, bw, bh, m);
2192            if cost == best_cost {
2193                best_mode = m;
2194            }
2195        }
2196    }
2197    best_mode
2198}
2199
2200/// Round 160 *slack-cost* variant of [`pick_block_mode_with_hint`].
2201///
2202/// Where the round-159 strict tie-break only swaps to the preferred
2203/// mode when its residual cost is **exactly equal** to the best,
2204/// this variant also accepts the preferred mode when its cost is
2205/// within an additive `slack` budget of the best. RFC 9649 §3.5
2206/// authorises the encoder to "decide \[transform data\] based on
2207/// entropy minimization", and the slack budget formalises the
2208/// trade-off: a small per-pixel-magnitude increase in the §4.1
2209/// residual stream may be acceptable when it strictly reduces the
2210/// entropy of the §7.2 predictor sub-image (longer run of identical
2211/// mode values → fewer distinct prefix-code symbols → fewer bytes
2212/// emitted for the sub-image).
2213///
2214/// This is no longer a residual-cost-neutral swap: the residuals
2215/// produced by the main image's forward transform **do change** on
2216/// a slack-accepted swap. Decode round-trips are still bit-correct
2217/// (the residuals are recomputed against the chosen mode at
2218/// `apply_forward_predictor` time, and the decoder applies the same
2219/// mode in reverse), but pixel-level decode equivalence between two
2220/// encoder runs at different slack budgets is **not** preserved —
2221/// only end-to-end image round-trip equivalence is.
2222///
2223/// The encoder protects itself from regressions by building both the
2224/// `slack = 0` (strict, round-159 baseline) and `slack > 0`
2225/// predictor candidates and keeping the strictly-smaller encoded
2226/// stream — so a slack candidate that hurts overall byte cost on
2227/// some input is simply not chosen.
2228#[allow(clippy::too_many_arguments)]
2229fn pick_block_mode_with_hint_slack(
2230    pixels: &[u32],
2231    width: usize,
2232    height: usize,
2233    x0: usize,
2234    y0: usize,
2235    bw: usize,
2236    bh: usize,
2237    prefer_mode: Option<u8>,
2238    slack: u64,
2239) -> u8 {
2240    let mut best_mode: u8 = 0;
2241    let mut best_cost = u64::MAX;
2242    for mode in 0u8..=13 {
2243        // Row-granular prune against the current best; pick-identical
2244        // to a full walk (see `block_mode_cost_capped`).
2245        let cost = block_mode_cost_capped(pixels, width, height, x0, y0, bw, bh, mode, best_cost);
2246        if cost < best_cost {
2247            best_cost = cost;
2248            best_mode = mode;
2249        }
2250    }
2251    // Round-160 slack-cost tie-break: accept the preferred neighbour
2252    // mode when its cost is within `slack` of the best cost. The
2253    // slack budget lets the encoder trade a small residual increase
2254    // for a predictor-sub-image entropy drop. `slack == 0` recovers
2255    // the round-159 strict tie-break behaviour exactly.
2256    if let Some(m) = prefer_mode {
2257        if m != best_mode {
2258            let cost = block_mode_cost(pixels, width, height, x0, y0, bw, bh, m);
2259            if cost <= best_cost.saturating_add(slack) {
2260                best_mode = m;
2261            }
2262        }
2263    }
2264    best_mode
2265}
2266
2267/// Build the §4.1 sub-resolution *predictor image*: one ARGB pixel
2268/// per `(1 << size_bits)`-pixel-square block of the main image, with
2269/// the chosen mode stored in the green channel (alpha/red/blue
2270/// fixed at 0xff / 0 / 0 — the decoder only reads the green channel
2271/// via `inverse_predictor`'s `green(predictor_image[...])`).
2272///
2273/// Returns `(predictor_image, transform_width, transform_height)`.
2274/// `transform_width = DIV_ROUND_UP(width, 1 << size_bits)` and
2275/// `transform_height = DIV_ROUND_UP(height, 1 << size_bits)`, per
2276/// §4.1.
2277///
2278/// Round 159: each block consults
2279/// [`pick_block_mode_with_hint`] with the immediately-prior
2280/// block's chosen mode as the preferred tie-break — left neighbour
2281/// in the current row, or the top neighbour for blocks in the left
2282/// column (no neighbour for the top-left block). This is a strict
2283/// tie-break: when the preferred mode's residual cost equals the
2284/// otherwise-lowest cost, the neighbour's value is chosen so the
2285/// predictor sub-image carries longer runs of identical modes,
2286/// dropping the sub-image's entropy and the bytes the
2287/// `entropy-coded-image` writer emits for it. Residual values are
2288/// unchanged on cost-equal swaps, so decoded pixels are
2289/// bit-identical to the round-158 baseline.
2290fn build_predictor_image(
2291    pixels: &[u32],
2292    width: u32,
2293    height: u32,
2294    size_bits: u8,
2295) -> (Vec<u32>, u32, u32) {
2296    let block = 1u32 << size_bits;
2297    let tw = predictor_div_round_up(width, block);
2298    let th = predictor_div_round_up(height, block);
2299    let mut img = Vec::with_capacity((tw * th) as usize);
2300    let w = width as usize;
2301    let h = height as usize;
2302    let bsz = block as usize;
2303    // Track the previous row's chosen modes so the left-column
2304    // blocks can fall back to a top neighbour. Each slot is `None`
2305    // while building the very first row.
2306    let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2307    for by in 0..th as usize {
2308        let mut left_mode: Option<u8> = None;
2309        for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2310            let x0 = bx * bsz;
2311            let y0 = by * bsz;
2312            // Preferred tie-break: left neighbour (current row) if
2313            // present, else top neighbour (previous row). The
2314            // top-left block (by == 0 && bx == 0) gets no hint and
2315            // falls back to the lowest-tied-mode default.
2316            let prefer = left_mode.or(*top_slot);
2317            let mode = pick_block_mode_with_hint(pixels, w, h, x0, y0, bsz, bsz, prefer);
2318            // Pack mode into the green channel; opaque alpha and
2319            // zeroed red/blue keep the sub-image visually inert and
2320            // match the channel the decoder reads.
2321            img.push(0xff00_0000 | ((mode as u32) << 8));
2322            left_mode = Some(mode);
2323            *top_slot = Some(mode);
2324        }
2325    }
2326    (img, tw, th)
2327}
2328
2329/// Round-160 *slack-cost* variant of [`build_predictor_image`].
2330///
2331/// Identical structure to `build_predictor_image`, but routes every
2332/// per-block mode choice through [`pick_block_mode_with_hint_slack`]
2333/// with the caller-supplied `slack` budget. `slack == 0` recovers
2334/// `build_predictor_image` exactly. Larger `slack` values let the
2335/// preferred neighbour mode win even at a small residual-cost
2336/// increase, trading per-pixel residual mass against the §7.2
2337/// predictor-sub-image's symbol entropy.
2338///
2339/// Round-trip correctness is unaffected by `slack`: the forward
2340/// transform later re-derives residuals against the chosen modes,
2341/// and the decoder's inverse pass uses the same modes from the
2342/// sub-image, so the decoded image always equals the input.
2343///
2344/// The encoder chooser builds both `slack == 0` and `slack > 0`
2345/// candidates and keeps the shortest, so a slack candidate that
2346/// hurts overall byte cost on a given input is simply not chosen.
2347fn build_predictor_image_with_slack(
2348    pixels: &[u32],
2349    width: u32,
2350    height: u32,
2351    size_bits: u8,
2352    slack: u64,
2353) -> (Vec<u32>, u32, u32) {
2354    let block = 1u32 << size_bits;
2355    let tw = predictor_div_round_up(width, block);
2356    let th = predictor_div_round_up(height, block);
2357    let mut img = Vec::with_capacity((tw * th) as usize);
2358    let w = width as usize;
2359    let h = height as usize;
2360    let bsz = block as usize;
2361    let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2362    for by in 0..th as usize {
2363        let mut left_mode: Option<u8> = None;
2364        for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2365            let x0 = bx * bsz;
2366            let y0 = by * bsz;
2367            let prefer = left_mode.or(*top_slot);
2368            let mode =
2369                pick_block_mode_with_hint_slack(pixels, w, h, x0, y0, bsz, bsz, prefer, slack);
2370            img.push(0xff00_0000 | ((mode as u32) << 8));
2371            left_mode = Some(mode);
2372            *top_slot = Some(mode);
2373        }
2374    }
2375    (img, tw, th)
2376}
2377
2378/// Round 161 *Shannon-entropy bit-cost* per-mode cost function.
2379///
2380/// Where [`block_mode_cost`] sums the folded L1 magnitude of the
2381/// per-pixel residual as a *proxy* for Huffman bit cost, this
2382/// function computes the actual lower-bound bit cost a Huffman code
2383/// over the residual byte distribution would emit:
2384///
2385/// 1. Build the per-channel `[u32; 256]` histogram of the block's
2386///    mod-256 residuals against the candidate `mode`.
2387/// 2. Compute the Shannon entropy `H = -Σ (c/N) · log2(c/N)` over
2388///    each channel's histogram (zero-count bins contribute zero).
2389/// 3. Sum `N · H` across channels — this is the lower-bound bit
2390///    count a per-symbol Huffman code over those residuals would
2391///    emit (the encoder's actual prefix coder is within ~1 bit of
2392///    this bound per symbol, so the bit-count *ordering* between
2393///    modes is faithful even though absolute counts differ by O(1)
2394///    per symbol).
2395///
2396/// The cost is returned as a fixed-point u64 in units of
2397/// **milli-bits** (1 bit = 1000 units) so comparisons stay exact
2398/// without floats leaking into the chooser's tie-break logic. The
2399/// quantisation rounds to the nearest milli-bit which is finer
2400/// than any Huffman code's per-symbol cost, so two modes that
2401/// would tie in floating-point also tie in the quantised cost.
2402///
2403/// Walks every in-bounds pixel without an early-out (unlike
2404/// [`block_mode_cost`]'s magnitude proxy which can prune): the
2405/// per-channel histograms must be complete before the entropy
2406/// sum is meaningful.
2407#[allow(clippy::too_many_arguments)]
2408fn block_mode_entropy_cost(
2409    pixels: &[u32],
2410    width: usize,
2411    height: usize,
2412    x0: usize,
2413    y0: usize,
2414    bw: usize,
2415    bh: usize,
2416    mode: u8,
2417) -> u64 {
2418    let mut sink = ResidualHistogramSink {
2419        hist: [[0u32; 256]; 4],
2420        n: 0,
2421    };
2422    for_each_block_residual(pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
2423    let hist = sink.hist;
2424    let n = sink.n;
2425    if n == 0 {
2426        return 0;
2427    }
2428    // Σ_channels Σ_b c·log2(N/c) milli-bits, with c·log2(N/c) =
2429    // c·(log2(N) − log2(c)). Float arithmetic is fine here: the
2430    // result is rounded to nearest milli-bit before u64 cast, so
2431    // bit-for-bit determinism holds across platforms with IEEE-754
2432    // ln(). The Shannon expansion picks `log2(N/c)` rather than
2433    // `−log2(c/N)` to keep the per-bin operand non-negative (zero
2434    // when c = N, growing as c shrinks) which is friendly to the
2435    // accumulator.
2436    let n_f = n as f64;
2437    let log2_n = n_f.log2();
2438    let mut milli_bits: f64 = 0.0;
2439    for channel_hist in &hist {
2440        for &count in channel_hist.iter() {
2441            if count == 0 {
2442                continue;
2443            }
2444            let c_f = count as f64;
2445            // Per-bin contribution to N·H: c·log2(N/c).
2446            milli_bits += c_f * (log2_n - c_f.log2());
2447        }
2448    }
2449    // Scale to milli-bits and round to nearest.
2450    (milli_bits * 1000.0 + 0.5) as u64
2451}
2452
2453/// Round 161 *Shannon-entropy bit-cost* variant of
2454/// [`pick_block_mode_with_hint`].
2455///
2456/// Picks the §4.1 mode minimising [`block_mode_entropy_cost`] — a
2457/// true Huffman lower-bound bit cost rather than the L1 magnitude
2458/// proxy the round-159/160 chooser uses. The entropy bit-cost
2459/// correctly distinguishes a "near-zero with two outliers"
2460/// residual distribution (low L1, but the outliers force long
2461/// Huffman codes for the two distinct outlier values) from a
2462/// "spread of small values" distribution (slightly higher L1, but
2463/// more concentrated histogram → lower Huffman cost). The L1
2464/// proxy treats them as comparable; the entropy cost reflects
2465/// what the §5.x prefix-code writer will actually emit.
2466///
2467/// The hint mechanism mirrors [`pick_block_mode_with_hint`]: when
2468/// `prefer_mode = Some(m)` and `m`'s entropy cost equals the
2469/// chooser's best, the chooser returns `m` so the predictor sub-
2470/// image carries longer runs of identical mode values (§7.2
2471/// `entropy-coded-image` shrinks).
2472///
2473/// This is a strict tie-break: residual values are unchanged on
2474/// cost-equal swaps, so decode round-trips are bit-identical
2475/// across `prefer_mode` choices. End-to-end the encoder builds
2476/// both the L1-proxy and entropy-cost candidates and keeps the
2477/// shortest stream, so the entropy candidate cannot regress
2478/// against the L1 path — see [`encode_argb_with_predictor_chooser`].
2479#[allow(clippy::too_many_arguments)]
2480fn pick_block_mode_with_hint_entropy(
2481    pixels: &[u32],
2482    width: usize,
2483    height: usize,
2484    x0: usize,
2485    y0: usize,
2486    bw: usize,
2487    bh: usize,
2488    prefer_mode: Option<u8>,
2489) -> u8 {
2490    let mut best_mode: u8 = 0;
2491    let mut best_cost = u64::MAX;
2492    for mode in 0u8..=13 {
2493        let cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, mode);
2494        if cost < best_cost {
2495            best_cost = cost;
2496            best_mode = mode;
2497        }
2498    }
2499    // Round-159-style strict tie-break under the entropy cost.
2500    if let Some(m) = prefer_mode {
2501        if m != best_mode {
2502            let cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, m);
2503            if cost == best_cost {
2504                best_mode = m;
2505            }
2506        }
2507    }
2508    best_mode
2509}
2510
2511/// Round 161 *Shannon-entropy bit-cost* variant of
2512/// [`build_predictor_image`].
2513///
2514/// Identical structure to `build_predictor_image`, but routes every
2515/// per-block mode choice through [`pick_block_mode_with_hint_entropy`]
2516/// — replacing the round-159 L1-magnitude proxy with a true Huffman
2517/// lower-bound bit cost. The strict-tie-break hint mechanism is
2518/// preserved: the left neighbour (or top neighbour at the left
2519/// edge) is the preferred mode on cost-equal swaps.
2520///
2521/// Round-trip correctness is unaffected by the cost model choice:
2522/// the forward transform later re-derives residuals against the
2523/// chosen modes, and the decoder's inverse pass uses the same modes
2524/// from the sub-image, so the decoded image always equals the input.
2525///
2526/// The encoder chooser keeps both the L1-proxy candidates (round-
2527/// 159/160) and the entropy candidate and emits the shortest
2528/// stream, so a fixture on which the L1 proxy is genuinely better
2529/// is simply not regressed against.
2530fn build_predictor_image_entropy(
2531    pixels: &[u32],
2532    width: u32,
2533    height: u32,
2534    size_bits: u8,
2535) -> (Vec<u32>, u32, u32) {
2536    let block = 1u32 << size_bits;
2537    let tw = predictor_div_round_up(width, block);
2538    let th = predictor_div_round_up(height, block);
2539    let mut img = Vec::with_capacity((tw * th) as usize);
2540    let w = width as usize;
2541    let h = height as usize;
2542    let bsz = block as usize;
2543    let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2544    for by in 0..th as usize {
2545        let mut left_mode: Option<u8> = None;
2546        for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2547            let x0 = bx * bsz;
2548            let y0 = by * bsz;
2549            let prefer = left_mode.or(*top_slot);
2550            let mode = pick_block_mode_with_hint_entropy(pixels, w, h, x0, y0, bsz, bsz, prefer);
2551            img.push(0xff00_0000 | ((mode as u32) << 8));
2552            left_mode = Some(mode);
2553            *top_slot = Some(mode);
2554        }
2555    }
2556    (img, tw, th)
2557}
2558
2559/// Round 162 — milli-bit Shannon delta for adding one occurrence of
2560/// `mode` to a running sub-image mode histogram with current counts
2561/// `hist[0..14]` and total `total`.
2562///
2563/// Returns `(N_new · H_new − N_old · H_old)` in milli-bits, where
2564/// `H = −Σ p·log2(p)` over the 14-bin mode distribution. This is the
2565/// **exact** marginal Shannon contribution of one extra `mode`
2566/// occurrence to the sub-image's symbol entropy mass — the same
2567/// `Σ c·log2(N/c)` form [`block_mode_entropy_cost`] uses, applied to
2568/// the sub-image's green-channel mode distribution rather than the
2569/// per-block residual byte histogram.
2570///
2571/// At the floor (`hist` all zero, `total == 0`) the delta is zero:
2572/// adding the first symbol moves the system from a degenerate
2573/// no-symbol state to a single-symbol histogram with `H = 0`. The
2574/// first **subsequent** occurrence of a *different* mode does grow
2575/// the mass (now two distinct symbols, total = 2 → `N·H = 2`). The
2576/// formula stays well-defined at every step because the post-add
2577/// histogram always has `total + 1 ≥ 1` and all bins with `c == 0`
2578/// are skipped from the sum.
2579///
2580/// Used by [`pick_block_mode_with_hint_entropy_subaware`] to charge a
2581/// per-block mode candidate not only for its own residual entropy
2582/// but also for its marginal contribution to the §7.2 predictor
2583/// sub-image's prefix-code mass — making the chooser sub-image-
2584/// aware in a way the round-159 hint and round-160 slack budget were
2585/// not (those mechanisms only acted on local neighbour identity,
2586/// without any global accounting of the sub-image's distribution
2587/// shape).
2588fn sub_image_mode_cost_delta_milli(hist: &[u32; 14], total: u32, mode: u8) -> u64 {
2589    debug_assert!(mode < 14);
2590    // Compute Σ c·log2(N/c) before and after; the delta is the
2591    // marginal Shannon mass in bits, scaled to milli-bits and
2592    // rounded to nearest u64. Float arithmetic is fine here for the
2593    // same reason as `block_mode_entropy_cost`: the rounding step
2594    // makes the result bit-for-bit deterministic across IEEE-754
2595    // log2 implementations to within ±1 milli-bit, which is finer
2596    // than any per-symbol cost ordering.
2597    let n_old = total as f64;
2598    let n_new = (total + 1) as f64;
2599    let log2_n_old = if total > 0 { n_old.log2() } else { 0.0 };
2600    let log2_n_new = n_new.log2();
2601    let mut mass_old: f64 = 0.0;
2602    let mut mass_new: f64 = 0.0;
2603    for (m, &c) in hist.iter().enumerate() {
2604        let c_after = if m == mode as usize { c + 1 } else { c };
2605        if c > 0 {
2606            let c_f = c as f64;
2607            mass_old += c_f * (log2_n_old - c_f.log2());
2608        }
2609        if c_after > 0 {
2610            let c_f = c_after as f64;
2611            mass_new += c_f * (log2_n_new - c_f.log2());
2612        }
2613    }
2614    let delta = (mass_new - mass_old).max(0.0);
2615    (delta * 1000.0 + 0.5) as u64
2616}
2617
2618/// Round 162 — *sub-image-aware* Shannon-entropy bit-cost variant of
2619/// [`pick_block_mode_with_hint_entropy`].
2620///
2621/// Picks the §4.1 mode minimising the **joint** cost
2622///
2623/// ```text
2624///     cost(m) = block_mode_entropy_cost(..., m)
2625///             + (lambda_milli * sub_image_mode_cost_delta_milli(hist, total, m)) / 1000
2626/// ```
2627///
2628/// where the first term is the per-block residual entropy (same
2629/// metric the round-161 chooser uses) and the second term is the
2630/// marginal §7.2 predictor sub-image cost — the bits the
2631/// `entropy-coded-image` writer will emit for this mode value given
2632/// the sub-image's running distribution shape. `lambda_milli` is the
2633/// per-sub-image-bit weight, in milli-units (so `lambda_milli = 1000`
2634/// weights one sub-image bit equal to one residual bit). Larger
2635/// lambda biases the chooser toward modes that reuse already-popular
2636/// values in the sub-image; `lambda_milli == 0` recovers the round-
2637/// 161 entropy-only chooser exactly (no sub-image weighting at all).
2638///
2639/// The round-159 strict tie-break hint is preserved: when
2640/// `prefer_mode = Some(m)` and `m`'s joint cost equals the chooser's
2641/// best, the chooser returns `m` so the sub-image keeps the longer
2642/// run of identical mode values. The hint check uses the same joint
2643/// cost (residual + lambda · sub-image delta) the main sweep uses,
2644/// so the tie semantics stay self-consistent.
2645///
2646/// Round-trip correctness is unaffected by the cost model choice:
2647/// the forward transform later re-derives residuals against the
2648/// chosen modes, and the decoder's inverse pass uses the same modes
2649/// from the sub-image, so the decoded image always equals the input.
2650///
2651/// The encoder protects itself from regressions by building both the
2652/// round-161 (sub-image-unaware) and round-162 (sub-image-aware at
2653/// multiple lambda values) predictor candidates and keeping the
2654/// shortest stream — so a fixture on which the sub-image weighting
2655/// hurts overall byte cost is simply not chosen.
2656#[allow(clippy::too_many_arguments)]
2657fn pick_block_mode_with_hint_entropy_subaware(
2658    pixels: &[u32],
2659    width: usize,
2660    height: usize,
2661    x0: usize,
2662    y0: usize,
2663    bw: usize,
2664    bh: usize,
2665    prefer_mode: Option<u8>,
2666    sub_image_hist: &[u32; 14],
2667    sub_image_total: u32,
2668    lambda_milli: u64,
2669) -> u8 {
2670    let mut best_mode: u8 = 0;
2671    let mut best_cost = u64::MAX;
2672    for mode in 0u8..=13 {
2673        let residual_cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, mode);
2674        let sub_delta = sub_image_mode_cost_delta_milli(sub_image_hist, sub_image_total, mode);
2675        // lambda_milli is "per-sub-image-bit weight in milli-units".
2676        // sub_delta is already in milli-bits. Multiply and divide by
2677        // 1000 to keep the whole expression in milli-bit units.
2678        let weighted_sub = sub_delta.saturating_mul(lambda_milli) / 1000;
2679        let cost = residual_cost.saturating_add(weighted_sub);
2680        if cost < best_cost {
2681            best_cost = cost;
2682            best_mode = mode;
2683        }
2684    }
2685    if let Some(m) = prefer_mode {
2686        if m != best_mode {
2687            let residual_cost = block_mode_entropy_cost(pixels, width, height, x0, y0, bw, bh, m);
2688            let sub_delta = sub_image_mode_cost_delta_milli(sub_image_hist, sub_image_total, m);
2689            let weighted_sub = sub_delta.saturating_mul(lambda_milli) / 1000;
2690            let cost = residual_cost.saturating_add(weighted_sub);
2691            if cost == best_cost {
2692                best_mode = m;
2693            }
2694        }
2695    }
2696    best_mode
2697}
2698
2699/// Round 162 *sub-image-aware* variant of
2700/// [`build_predictor_image_entropy`].
2701///
2702/// Identical structure to `build_predictor_image_entropy`, but routes
2703/// every per-block mode choice through
2704/// [`pick_block_mode_with_hint_entropy_subaware`] with a running
2705/// histogram of the sub-image's mode values chosen so far. `lambda_milli`
2706/// is the per-sub-image-bit weight (see
2707/// [`pick_block_mode_with_hint_entropy_subaware`] for the unit). The
2708/// round-159 strict-tie-break hint mechanism is preserved: the left
2709/// neighbour (or top neighbour at the left edge) is the preferred
2710/// mode on joint-cost-equal swaps.
2711///
2712/// `lambda_milli == 0` is byte-identical to
2713/// `build_predictor_image_entropy` (the sub-image term contributes
2714/// zero to every candidate). Larger `lambda_milli` biases the
2715/// chooser toward modes that reuse already-popular values in the
2716/// sub-image.
2717///
2718/// Round-trip correctness is unaffected: the decoder reads the
2719/// chosen modes from the sub-image; the forward transform recomputes
2720/// residuals against them. The chooser's joint-cost choice only
2721/// shifts which mode is recorded per block — never the decode
2722/// reconstruction path.
2723fn build_predictor_image_entropy_subaware(
2724    pixels: &[u32],
2725    width: u32,
2726    height: u32,
2727    size_bits: u8,
2728    lambda_milli: u64,
2729) -> (Vec<u32>, u32, u32) {
2730    let block = 1u32 << size_bits;
2731    let tw = predictor_div_round_up(width, block);
2732    let th = predictor_div_round_up(height, block);
2733    let mut img = Vec::with_capacity((tw * th) as usize);
2734    let w = width as usize;
2735    let h = height as usize;
2736    let bsz = block as usize;
2737    let mut prev_row: Vec<Option<u8>> = vec![None; tw as usize];
2738    let mut hist = [0u32; 14];
2739    let mut total: u32 = 0;
2740    for by in 0..th as usize {
2741        let mut left_mode: Option<u8> = None;
2742        for (bx, top_slot) in prev_row.iter_mut().enumerate() {
2743            let x0 = bx * bsz;
2744            let y0 = by * bsz;
2745            let prefer = left_mode.or(*top_slot);
2746            let mode = pick_block_mode_with_hint_entropy_subaware(
2747                pixels,
2748                w,
2749                h,
2750                x0,
2751                y0,
2752                bsz,
2753                bsz,
2754                prefer,
2755                &hist,
2756                total,
2757                lambda_milli,
2758            );
2759            img.push(0xff00_0000 | ((mode as u32) << 8));
2760            left_mode = Some(mode);
2761            *top_slot = Some(mode);
2762            hist[mode as usize] += 1;
2763            total += 1;
2764        }
2765    }
2766    (img, tw, th)
2767}
2768
2769/// Round 305 — per-block predictor-mode selection strategy for the §4.1
2770/// predictor sub-image, used to parameterise which cost model the stacked
2771/// §3.5 transform chains build their predictor sub-image with.
2772///
2773/// The single-transform predictor path
2774/// ([`encode_argb_with_predictor_chooser`]) already sweeps every one of
2775/// these strategies and keeps the byte-shortest stream (rounds 159–162).
2776/// The stacked chains added in rounds 302–304
2777/// ([`encode_with_color_transform_predictor`],
2778/// [`encode_with_color_transform_subtract_green_predictor`],
2779/// [`encode_with_color_indexing_predictor`]) were bootstrapped with only
2780/// the [`PredictorSubImageStrategy::L1`] proxy chooser. Threading this
2781/// strategy through them lets the chooser try the entropy and
2782/// sub-image-aware entropy cost models over the *transform-decorrelated*
2783/// image those chains feed the predictor — exactly the residual the
2784/// predictor sub-image actually sees — and keep whichever is smallest.
2785///
2786/// Round-trip correctness is independent of the strategy: every variant
2787/// only changes *which §4.1 mode is recorded per block* in the sub-image;
2788/// the forward transform recomputes residuals against the chosen modes and
2789/// the decoder reads the same modes back, so the reconstruction is
2790/// bit-identical regardless of strategy. The chooser keeps the strategy
2791/// solely on a byte-cost basis, so a strategy that hurts on a given input
2792/// is simply not selected — the path strictly extends the encoder's option
2793/// set without ever regressing the L1 baseline.
2794#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2795enum PredictorSubImageStrategy {
2796    /// Round-159 folded-L1 magnitude proxy chooser
2797    /// ([`build_predictor_image`]).
2798    L1,
2799    /// Round-161 Shannon-entropy bit-cost chooser
2800    /// ([`build_predictor_image_entropy`]).
2801    Entropy,
2802    /// Round-162 sub-image-aware Shannon-entropy chooser
2803    /// ([`build_predictor_image_entropy_subaware`]) at the given
2804    /// `lambda_milli` per-sub-image-bit weight.
2805    EntropySubaware { lambda_milli: u64 },
2806}
2807
2808/// Round 305 — build a §4.1 predictor sub-image under the given
2809/// [`PredictorSubImageStrategy`]. Dispatches to the round-159 / round-161 /
2810/// round-162 builders, all of which share the
2811/// `(pixels, width, height, size_bits) -> (sub_image, tw, th)` shape, so
2812/// the stacked chains can pick a cost model uniformly. See
2813/// [`PredictorSubImageStrategy`] for the round-trip invariance argument.
2814fn build_predictor_image_strategy(
2815    pixels: &[u32],
2816    width: u32,
2817    height: u32,
2818    size_bits: u8,
2819    strategy: PredictorSubImageStrategy,
2820) -> (Vec<u32>, u32, u32) {
2821    match strategy {
2822        PredictorSubImageStrategy::L1 => build_predictor_image(pixels, width, height, size_bits),
2823        PredictorSubImageStrategy::Entropy => {
2824            build_predictor_image_entropy(pixels, width, height, size_bits)
2825        }
2826        PredictorSubImageStrategy::EntropySubaware { lambda_milli } => {
2827            build_predictor_image_entropy_subaware(pixels, width, height, size_bits, lambda_milli)
2828        }
2829    }
2830}
2831
2832/// Round 306 — the predictor-sub-image strategies the stacked §3.5 chains
2833/// sweep. The L1 proxy (the rounds 302–304 baseline) leads so a tie keeps
2834/// the historical choice; the round-161 plain entropy chooser follows; then
2835/// the round-162 sub-image-aware entropy chooser across the **full lambda
2836/// sweep** the single-transform predictor path
2837/// ([`encode_argb_with_predictor_chooser`]) has carried since round 162.
2838///
2839/// Round 305 bootstrapped the stacked chains with only a single mid-range
2840/// sub-image-aware lambda (`16_000`); the single-transform path instead
2841/// sweeps four weights — `4_000` / `16_000` / `64_000` / `256_000`
2842/// milli-per-bit — straddling the empirically-observed residual-vs-
2843/// sub-image cost crossover (~`64_000`) on smooth transform-decorrelated
2844/// content. Below the crossover the residual cost dominates and a low
2845/// lambda barely perturbs the round-161 choice; above it the §7.2
2846/// sub-image's prefix-code mass dominates and a high lambda converges the
2847/// mode set into longer runs, shrinking the sub-image header. Threading the
2848/// same four weights through the stacked chains lets each chain land on the
2849/// crossover its own *transform-decorrelated* residual exhibits rather than
2850/// the one fixed mid-range guess.
2851///
2852/// The chooser keeps the byte-shortest stream across all six strategies, so
2853/// the wider sweep is strictly non-regressing against both the L1 baseline
2854/// and the round-305 single-lambda setting. Round-trip output is unchanged
2855/// by the strategy: lambda only biases *which §4.1 mode is recorded* per
2856/// block; the forward transform recomputes residuals against the chosen
2857/// modes and the decoder reads the same modes back.
2858const STACKED_PREDICTOR_STRATEGIES: [PredictorSubImageStrategy; 6] = [
2859    PredictorSubImageStrategy::L1,
2860    PredictorSubImageStrategy::Entropy,
2861    PredictorSubImageStrategy::EntropySubaware {
2862        lambda_milli: 4_000,
2863    },
2864    PredictorSubImageStrategy::EntropySubaware {
2865        lambda_milli: 16_000,
2866    },
2867    PredictorSubImageStrategy::EntropySubaware {
2868        lambda_milli: 64_000,
2869    },
2870    PredictorSubImageStrategy::EntropySubaware {
2871        lambda_milli: 256_000,
2872    },
2873];
2874
2875/// Apply the §4.1 *forward* predictor transform: for each pixel,
2876/// replace it with the per-channel mod-256 residual `(original -
2877/// pred)`. `pred` is computed from the **source** (un-modified)
2878/// pixels — see [`predictor_at`] — so the decoder's inverse pass
2879/// (which uses already-reconstructed pixels equal to those source
2880/// pixels) recovers the originals exactly.
2881///
2882/// Writes residuals into `dst` (`width * height` long). `src` is
2883/// the un-modified source. `predictor_image` / `transform_width` /
2884/// `size_bits` describe the sub-resolution mode image. Per §4.1's
2885/// border rules the top-left predicts solid black, the top row
2886/// predicts L, the left column predicts T, the rightmost column
2887/// uses the row's leftmost pixel as TR; interior pixels read their
2888/// mode from the predictor image's green channel.
2889fn apply_forward_predictor(
2890    src: &[u32],
2891    dst: &mut [u32],
2892    width: u32,
2893    height: u32,
2894    predictor_image: &[u32],
2895    transform_width: u32,
2896    size_bits: u8,
2897) {
2898    if width == 0 || height == 0 {
2899        return;
2900    }
2901    let w = width as usize;
2902    let h = height as usize;
2903    for y in 0..h {
2904        for x in 0..w {
2905            let idx = y * w + x;
2906            // Interior pixels read their block mode from the
2907            // sub-resolution predictor image; border rules in
2908            // `predictor_at` ignore the mode for top-row /
2909            // left-column / top-left pixels.
2910            let mode = if x == 0 || y == 0 {
2911                0
2912            } else {
2913                let bx = (x as u32) >> size_bits;
2914                let by = (y as u32) >> size_bits;
2915                let block_index = (by * transform_width + bx) as usize;
2916                ((predictor_image[block_index] >> 8) & 0xff) as u8
2917            };
2918            let pred = predictor_at(src, w, x, y, mode);
2919            dst[idx] = predictor_subtract(src[idx], pred);
2920        }
2921    }
2922}
2923
2924/// Default §4.1 `size_bits` value the encoder picks for the
2925/// predictor sub-image: `4` → 16×16 pixel blocks. Smaller blocks
2926/// give finer mode granularity (better residual savings) at the
2927/// cost of a larger predictor sub-image (4× the entries for each
2928/// `size_bits` decrement). 16×16 is a reasonable middle ground for
2929/// the typical encoder workloads here; the spec admits `2..=9`
2930/// (`block` sizes 4..=512). As of round 155 the chooser also
2931/// evaluates a maximal single-block candidate by promoting
2932/// `size_bits` until `1 << size_bits ≥ max(width, height)`, so the
2933/// default value here only sets the per-region granularity floor;
2934/// see [`encode_argb_with_predictor_chooser`].
2935const DEFAULT_PREDICTOR_SIZE_BITS: u8 = 4;
2936
2937/// Encode `pixels` taking the §4.1 spatial predictor path: pick a
2938/// per-block predictor mode minimising the residual magnitude,
2939/// transform the pixels to residuals, then encode the residuals via
2940/// the standard `spatially-coded-image` shape — wrapped by an
2941/// `optional-transform` whose first entry is the §4.1 predictor
2942/// transform (header bit `%b1` + transform type `Predictor = 0` +
2943/// 3-bit `size_bits - 2` + the sub-resolution predictor image as an
2944/// `entropy-coded-image`).
2945///
2946/// The chooser composes with `cache_code_bits`: when `Some(bits)` a
2947/// §5.2.3 color cache of that size is built over the residual
2948/// stream's literal tokens.
2949///
2950/// **NB:** the predictor transform requires at least a 2-pixel
2951/// dimension on the side being predicted (a 1-pixel image triggers
2952/// the §4.1 top-left-only border rule, so the transform body cannot
2953/// produce a meaningful residual). The caller should fall back to
2954/// the no-transform candidate for trivially small images.
2955fn encode_with_predictor(
2956    pixels: &[u32],
2957    width: u32,
2958    height: u32,
2959    size_bits: u8,
2960    cache_code_bits: Option<u32>,
2961    image_width: u32,
2962) -> Vec<u8> {
2963    let mut w = BitWriter::new();
2964
2965    // ---- §3.8.2 / §7.2 optional-transform: predictor-tx ----
2966    // present bit `%b1`.
2967    w.write_bit(true);
2968    // transform type `Predictor = 0`, 2 bits.
2969    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
2970    // 3-bit `size_bits - 2` (decoder adds 2 back per §4.1).
2971    debug_assert!((2..=9).contains(&size_bits));
2972    w.write_bits((size_bits - 2) as u32, 3);
2973
2974    // Build the sub-resolution predictor image then write it as an
2975    // entropy-coded-image per §7.2 `predictor-image = 3BIT
2976    // entropy-coded-image`.
2977    let (predictor_image, tw, _th) = build_predictor_image(pixels, width, height, size_bits);
2978    write_entropy_coded_image_literals(&mut w, &predictor_image);
2979
2980    // End of optional-transform list (`%b0`).
2981    w.write_bit(false);
2982
2983    // ---- Forward-transform the main image into residuals ----
2984    let mut residuals = vec![0u32; pixels.len()];
2985    apply_forward_predictor(
2986        pixels,
2987        &mut residuals,
2988        width,
2989        height,
2990        &predictor_image,
2991        tw,
2992        size_bits,
2993    );
2994
2995    // ---- Tokenise + emit the residual spatially-coded-image ----
2996    let mut tokens = tokenize_lz77(&residuals);
2997    if let Some(bits) = cache_code_bits {
2998        tokens = cacheify_tokens(&tokens, &residuals, bits);
2999    }
3000    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3001
3002    w.into_bytes()
3003}
3004
3005/// Round-160 *slack-cost* variant of [`encode_with_predictor`].
3006///
3007/// Same wire shape as `encode_with_predictor`, but the §4.1
3008/// predictor sub-image is built via
3009/// [`build_predictor_image_with_slack`] with the caller-supplied
3010/// `slack` budget. `slack == 0` produces a byte-identical stream
3011/// to `encode_with_predictor`.
3012///
3013/// `slack > 0` permits the chooser to swap to the preferred
3014/// neighbour mode at a small residual-cost increase, with the goal
3015/// of dropping the predictor sub-image's symbol entropy. The
3016/// chooser at [`encode_argb_with_predictor_chooser`] always
3017/// compares the slack candidates against `slack == 0`, so a slack
3018/// budget that hurts overall byte cost on a given input is
3019/// non-selecting (the strict candidate wins on byte length).
3020fn encode_with_predictor_slack(
3021    pixels: &[u32],
3022    width: u32,
3023    height: u32,
3024    size_bits: u8,
3025    cache_code_bits: Option<u32>,
3026    image_width: u32,
3027    slack: u64,
3028) -> Vec<u8> {
3029    let mut w = BitWriter::new();
3030
3031    w.write_bit(true);
3032    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3033    debug_assert!((2..=9).contains(&size_bits));
3034    w.write_bits((size_bits - 2) as u32, 3);
3035
3036    let (predictor_image, tw, _th) =
3037        build_predictor_image_with_slack(pixels, width, height, size_bits, slack);
3038    write_entropy_coded_image_literals(&mut w, &predictor_image);
3039
3040    w.write_bit(false);
3041
3042    let mut residuals = vec![0u32; pixels.len()];
3043    apply_forward_predictor(
3044        pixels,
3045        &mut residuals,
3046        width,
3047        height,
3048        &predictor_image,
3049        tw,
3050        size_bits,
3051    );
3052
3053    let mut tokens = tokenize_lz77(&residuals);
3054    if let Some(bits) = cache_code_bits {
3055        tokens = cacheify_tokens(&tokens, &residuals, bits);
3056    }
3057    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3058
3059    w.into_bytes()
3060}
3061
3062/// Round-161 *Shannon-entropy bit-cost* variant of
3063/// [`encode_with_predictor`].
3064///
3065/// Same wire shape as `encode_with_predictor`, but the §4.1
3066/// predictor sub-image is built via [`build_predictor_image_entropy`]
3067/// — replacing the per-block L1-magnitude proxy with a true Huffman
3068/// lower-bound bit cost on the per-channel residual histogram. The
3069/// chooser hint mechanism (strict tie-break favouring the
3070/// neighbour's mode) is preserved.
3071///
3072/// `encode_argb_with_predictor_chooser` always compares this
3073/// candidate against the L1-proxy candidates (round-159 strict tie-
3074/// break and round-160 slack variants), so on fixtures where the L1
3075/// proxy genuinely wins, the entropy candidate is non-selecting.
3076fn encode_with_predictor_entropy(
3077    pixels: &[u32],
3078    width: u32,
3079    height: u32,
3080    size_bits: u8,
3081    cache_code_bits: Option<u32>,
3082    image_width: u32,
3083) -> Vec<u8> {
3084    let mut w = BitWriter::new();
3085
3086    w.write_bit(true);
3087    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3088    debug_assert!((2..=9).contains(&size_bits));
3089    w.write_bits((size_bits - 2) as u32, 3);
3090
3091    let (predictor_image, tw, _th) =
3092        build_predictor_image_entropy(pixels, width, height, size_bits);
3093    write_entropy_coded_image_literals(&mut w, &predictor_image);
3094
3095    w.write_bit(false);
3096
3097    let mut residuals = vec![0u32; pixels.len()];
3098    apply_forward_predictor(
3099        pixels,
3100        &mut residuals,
3101        width,
3102        height,
3103        &predictor_image,
3104        tw,
3105        size_bits,
3106    );
3107
3108    let mut tokens = tokenize_lz77(&residuals);
3109    if let Some(bits) = cache_code_bits {
3110        tokens = cacheify_tokens(&tokens, &residuals, bits);
3111    }
3112    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3113
3114    w.into_bytes()
3115}
3116
3117/// Round 162 — *sub-image-aware* Shannon-entropy bit-cost predictor
3118/// path. Identical to [`encode_with_predictor_entropy`] but routes
3119/// the sub-image construction through
3120/// [`build_predictor_image_entropy_subaware`] with `lambda_milli` as
3121/// the per-sub-image-bit weight for the joint cost.
3122///
3123/// `lambda_milli == 0` is byte-identical to
3124/// [`encode_with_predictor_entropy`] (the sub-image term contributes
3125/// zero to every per-block choice, so the chooser falls back to the
3126/// round-161 entropy chooser).
3127///
3128/// `encode_argb_with_predictor_chooser` always compares the round-
3129/// 162 candidates (multiple lambda settings) against every round-159
3130/// / round-160 / round-161 candidate, so on fixtures where sub-
3131/// image weighting hurts overall byte cost, the round-162 candidate
3132/// is non-selecting and the path strictly extends the encoder's
3133/// option set rather than redirecting it.
3134fn encode_with_predictor_entropy_subaware(
3135    pixels: &[u32],
3136    width: u32,
3137    height: u32,
3138    size_bits: u8,
3139    cache_code_bits: Option<u32>,
3140    image_width: u32,
3141    lambda_milli: u64,
3142) -> Vec<u8> {
3143    let mut w = BitWriter::new();
3144
3145    w.write_bit(true);
3146    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
3147    debug_assert!((2..=9).contains(&size_bits));
3148    w.write_bits((size_bits - 2) as u32, 3);
3149
3150    let (predictor_image, tw, _th) =
3151        build_predictor_image_entropy_subaware(pixels, width, height, size_bits, lambda_milli);
3152    write_entropy_coded_image_literals(&mut w, &predictor_image);
3153
3154    w.write_bit(false);
3155
3156    let mut residuals = vec![0u32; pixels.len()];
3157    apply_forward_predictor(
3158        pixels,
3159        &mut residuals,
3160        width,
3161        height,
3162        &predictor_image,
3163        tw,
3164        size_bits,
3165    );
3166
3167    let mut tokens = tokenize_lz77(&residuals);
3168    if let Some(bits) = cache_code_bits {
3169        tokens = cacheify_tokens(&tokens, &residuals, bits);
3170    }
3171    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3172
3173    w.into_bytes()
3174}
3175
3176// ---- §3.5.2 / §4.2 forward color-transform encoder ------------------
3177
3178/// §3.5.2 `ColorTransformDelta(t, c)` = `(int8(t) * int8(c)) >> 5`,
3179/// with `t` and `c` interpreted as signed 8-bit two's-complement values.
3180/// Identical formula to the decoder's
3181/// [`crate::vp8l_transform::color_transform_delta`] — kept local so this
3182/// module compiles under `--no-default-features` (which the decoder also
3183/// satisfies, but the helper is `pub(crate)`-private to that file).
3184///
3185/// Only the low 8 bits of the result are meaningful per §3.5.2
3186/// ("only the lowest 8 bits are used from the result"); the wider `i32`
3187/// return type lets callers fold it into a signed pixel computation
3188/// before masking.
3189#[inline]
3190fn color_xfrm_delta(t: u8, c: u8) -> i32 {
3191    let ts = t as i8 as i32;
3192    let cs = c as i8 as i32;
3193    (ts * cs) >> 5
3194}
3195
3196/// §3.5.2 *forward* color-transform on one pixel.
3197///
3198/// Subtracts the three color-transform deltas from `red` and `blue`
3199/// (green is untouched per §3.5.2). The arguments mirror the §3.5.2
3200/// `ColorTransform()` C signature: the per-block element is unpacked
3201/// into `(green_to_red, green_to_blue, red_to_blue)`. Returns the
3202/// encoded `(new_red, new_blue)` as low 8-bit residuals. The §3.5.2
3203/// red argument to the third delta is the *original* `red` (not the
3204/// post-green-to-red residual), matching the spec's encoder pseudo-
3205/// code; the decoder's inverse adds the same delta back using its
3206/// reconstructed `tmp_red & 0xff`, which by symmetry equals the
3207/// original red, so the round-trip is bit-exact.
3208#[inline]
3209fn forward_color_pixel(
3210    r: u8,
3211    g: u8,
3212    b: u8,
3213    green_to_red: u8,
3214    green_to_blue: u8,
3215    red_to_blue: u8,
3216) -> (u8, u8) {
3217    let mut tmp_red = r as i32;
3218    let mut tmp_blue = b as i32;
3219    tmp_red -= color_xfrm_delta(green_to_red, g);
3220    tmp_blue -= color_xfrm_delta(green_to_blue, g);
3221    tmp_blue -= color_xfrm_delta(red_to_blue, r);
3222    ((tmp_red & 0xff) as u8, (tmp_blue & 0xff) as u8)
3223}
3224
3225/// §3.5.2 color-transform candidate values swept by [`pick_block_cte`]
3226/// for each of the three `(green_to_red, green_to_blue, red_to_blue)`
3227/// axes.
3228///
3229/// Each value is an 8-bit two's-complement integer. With the §3.5.2
3230/// fixed-point interpretation (`>> 5` divides by 32), a value of 32
3231/// corresponds to a slope of 1 in the corresponding channel; the
3232/// listed entries span `[-96, 96]` with fine resolution `±4` near
3233/// zero (where most natural-image channel correlations sit, e.g. a
3234/// slope of 1/3 ≈ 10.7 fixed-point) coarsening to `±16` further out.
3235/// Including 0 ("no transform") guarantees the per-axis chooser never
3236/// picks a CTE worse than the no-correlation baseline on that axis.
3237///
3238/// 25 candidates × 3 axes = 75 cost evaluations per block (with the
3239/// per-axis greedy in `pick_block_cte` being exact because the cost
3240/// decomposes additively across the red and blue channels — green is
3241/// untouched, the red channel depends only on `green_to_red`, and the
3242/// blue channel depends additively on `(green_to_blue, red_to_blue)`).
3243const CTE_AXIS_CANDIDATES: [u8; 25] = [
3244    0xa0, // -96
3245    0xb0, // -80
3246    0xc0, // -64
3247    0xd0, // -48
3248    0xe0, // -32
3249    0xe8, // -24
3250    0xec, // -20
3251    0xf0, // -16
3252    0xf4, // -12
3253    0xf8, //  -8
3254    0xfc, //  -4
3255    0xfe, //  -2
3256    0x00, //   0
3257    0x02, //   2
3258    0x04, //   4
3259    0x08, //   8
3260    0x0c, //  12
3261    0x10, //  16
3262    0x14, //  20
3263    0x18, //  24
3264    0x20, //  32
3265    0x30, //  48
3266    0x40, //  64
3267    0x50, //  80
3268    0x60, //  96
3269];
3270
3271/// Per-channel folded-magnitude cost: same residual-magnitude proxy
3272/// [`residual_magnitude`] uses for the §4.1 predictor, but on a single
3273/// 8-bit channel — `min(v, 256 - v)`. Lower magnitudes peak the
3274/// histogram near zero, which the per-channel Huffman codes compress
3275/// better.
3276#[inline]
3277fn channel_magnitude(v: u32) -> u32 {
3278    let v = v & 0xff;
3279    if v <= 128 {
3280        v
3281    } else {
3282        256 - v
3283    }
3284}
3285
3286/// §3.5.2: pick the `(green_to_red, green_to_blue, red_to_blue)`
3287/// element that minimises the residual-magnitude cost on the
3288/// rectangular block `[x0, x0+bw) × [y0, y0+bh)` of the
3289/// `width × height` image.
3290///
3291/// The cost decomposes additively across channels (green is untouched
3292/// by §3.5.2, red depends only on `green_to_red`, blue depends on
3293/// `green_to_blue + red_to_blue`), so a per-axis greedy sweep over
3294/// [`CTE_AXIS_CANDIDATES`] is exact:
3295///
3296/// 1. For each `gtr` candidate, sum `|red - delta(gtr, green)| & 0xff`
3297///    folded onto `[-128, 127]` over the block's pixels; keep the
3298///    smallest.
3299/// 2. For each `gtb` candidate, sum
3300///    `|blue - delta(gtb, green)| & 0xff` folded similarly.
3301/// 3. For each `rtb` candidate, sum
3302///    `|(blue - delta(best_gtb, green)) - delta(rtb, red)| & 0xff`.
3303///
3304/// On ties the candidate appearing earlier in
3305/// [`CTE_AXIS_CANDIDATES`] wins, which makes the chooser deterministic.
3306///
3307/// Public so the `pick_block_cte` criterion bench can drive the
3308/// chooser walk directly (same shelf as [`predictor_subtract`] /
3309/// [`apply_subtract_green`]); encoder callers go through
3310/// `build_color_image`.
3311pub fn pick_block_cte(
3312    pixels: &[u32],
3313    width: usize,
3314    height: usize,
3315    x0: usize,
3316    y0: usize,
3317    bw: usize,
3318    bh: usize,
3319) -> (u8, u8, u8) {
3320    // Gather the block's per-pixel channel triples once.
3321    let mut samples: Vec<(u8, u8, u8)> = Vec::with_capacity(bw * bh);
3322    for dy in 0..bh {
3323        let y = y0 + dy;
3324        if y >= height {
3325            break;
3326        }
3327        for dx in 0..bw {
3328            let x = x0 + dx;
3329            if x >= width {
3330                break;
3331            }
3332            let px = pixels[y * width + x];
3333            let r = ((px >> 16) & 0xff) as u8;
3334            let g = ((px >> 8) & 0xff) as u8;
3335            let b = (px & 0xff) as u8;
3336            samples.push((r, g, b));
3337        }
3338    }
3339    if samples.is_empty() {
3340        return (0, 0, 0);
3341    }
3342
3343    // Axis 1: green → red. The red residual is
3344    // `(red - delta(gtr, green)) & 0xff`, independent of gtb and rtb.
3345    let best_gtr = sweep_cte_axis(&samples, |gtr, r, g, _b| {
3346        channel_magnitude((r as i32 - color_xfrm_delta(gtr, g)) as u32)
3347    });
3348
3349    // Axis 2: green → blue. The intermediate blue residual is
3350    // `(blue - delta(gtb, green)) & 0xff`, independent of rtb. We
3351    // evaluate the GREEN→BLUE contribution alone here; the joint
3352    // (gtb, rtb) choice is exact because the red-to-blue delta is
3353    // additive in `rtb` and depends only on the original red.
3354    let best_gtb = sweep_cte_axis(&samples, |gtb, _r, g, b| {
3355        channel_magnitude((b as i32 - color_xfrm_delta(gtb, g)) as u32)
3356    });
3357
3358    // Axis 3: red → blue. Fold the now-fixed green→blue delta into
3359    // each pixel's intermediate blue, then sweep rtb.
3360    let best_rtb = sweep_cte_axis(&samples, |rtb, r, g, b| {
3361        let inter = b as i32 - color_xfrm_delta(best_gtb, g);
3362        channel_magnitude((inter - color_xfrm_delta(rtb, r)) as u32)
3363    });
3364
3365    (best_gtr, best_gtb, best_rtb)
3366}
3367
3368/// One per-axis greedy sweep of [`pick_block_cte`]: evaluate every
3369/// [`CTE_AXIS_CANDIDATES`] entry's summed per-sample cost and return
3370/// the candidate with the smallest sum (earliest entry wins ties).
3371///
3372/// The prune that used to run per sample (`cost >= best` → abandon
3373/// the candidate) is checked at [`CTE_PRUNE_CHUNK`]-sample
3374/// granularity instead, the same despecialisation the round-280
3375/// §4.1 chooser walker applied at block-row granularity: the
3376/// interior chunk loop carries no data-dependent exit, so the
3377/// monomorphised `cost` closure body auto-vectorises. Pick-identical
3378/// by the round-280 argument — per-sample contributions are
3379/// non-negative, so a partial sum reaching `>= best` implies the
3380/// full sum also compares `>= best`, and a candidate that now runs
3381/// to completion instead of pruning yields its exact full sum, which
3382/// is still `>= best` and therefore still loses; completed sums and
3383/// the strict-`<` tie-break are unchanged. Worst case is one extra
3384/// chunk of work per pruned candidate.
3385///
3386/// The per-chunk partial fits `u32`: each [`channel_magnitude`] is
3387/// `<= 128`, so a chunk sums to `<= 128 * CTE_PRUNE_CHUNK`.
3388#[inline]
3389fn sweep_cte_axis(samples: &[(u8, u8, u8)], cost_of: impl Fn(u8, u8, u8, u8) -> u32) -> u8 {
3390    let mut best: u8 = 0;
3391    let mut best_cost = u64::MAX;
3392    for &cand in &CTE_AXIS_CANDIDATES {
3393        let mut cost = 0u64;
3394        for chunk in samples.chunks(CTE_PRUNE_CHUNK) {
3395            let mut partial = 0u32;
3396            for &(r, g, b) in chunk {
3397                partial += cost_of(cand, r, g, b);
3398            }
3399            cost += partial as u64;
3400            if cost >= best_cost {
3401                break;
3402            }
3403        }
3404        if cost < best_cost {
3405            best_cost = cost;
3406            best = cand;
3407        }
3408    }
3409    best
3410}
3411
3412/// Sample granularity of the [`sweep_cte_axis`] prune check. 32
3413/// samples is two 16-pixel block rows at the encoder-default
3414/// `size_bits = 4` — small enough that a hopeless candidate is
3415/// abandoned after ~12% of a 16×16 block, large enough for the
3416/// branch-free interior loop to amortise the check.
3417const CTE_PRUNE_CHUNK: usize = 32;
3418
3419/// Cost model the §4.2 per-block color-transform-element chooser uses
3420/// to compare [`CTE_AXIS_CANDIDATES`] on each axis.
3421///
3422/// The two strategies sweep the *same* candidate grid with the *same*
3423/// per-axis greedy decomposition — only the per-axis scoring differs:
3424///
3425/// * [`ColorTransformStrategy::L1`] sums the folded per-channel
3426///   residual magnitude ([`channel_magnitude`]) over the block — the
3427///   round-147 proxy `pick_block_cte` has carried since the color
3428///   transform landed.
3429/// * [`ColorTransformStrategy::Entropy`] scores each candidate by the
3430///   Shannon lower-bound bit cost of the resulting per-channel
3431///   residual histogram ([`channel_residual_entropy_milli`]) — the
3432///   §4.2 analogue of the round-161 §4.1 predictor entropy chooser.
3433///   RFC 9649 §3.5 authorises the choice ("transform data can be
3434///   decided based on entropy minimization"); the entropy cost is the
3435///   metric the §5.x per-channel prefix codes actually minimise, so it
3436///   distinguishes a near-zero-with-outliers residual (low L1, but the
3437///   outliers force long codes) from a concentrated spread (slightly
3438///   higher L1, but a cheaper histogram) where the L1 proxy cannot.
3439///
3440/// The per-axis greedy stays exact under either model because the red
3441/// channel depends only on `green_to_red`, the blue channel depends
3442/// only on `(green_to_blue, red_to_blue)`, and red / blue carry
3443/// independent §5.x prefix codes — so red entropy minimises over
3444/// `green_to_red` alone, and the blue pair is chosen greedily
3445/// (`green_to_blue` first, then `red_to_blue` folding in the fixed
3446/// `green_to_blue` delta) exactly as the L1 path already does.
3447#[derive(Clone, Copy, PartialEq, Eq)]
3448enum ColorTransformStrategy {
3449    L1,
3450    Entropy,
3451}
3452
3453/// Shannon lower-bound bit cost (in milli-bits, rounded to nearest) of
3454/// a single 8-bit residual channel's 256-bin histogram.
3455///
3456/// `Σ_b c·log2(N/c)` with the same `log2(N) − log2(c)` expansion and
3457/// nearest-milli-bit rounding [`block_mode_entropy_cost`] uses, so the
3458/// §4.2 entropy chooser is byte-deterministic on the same terms as the
3459/// §4.1 predictor entropy chooser.
3460fn channel_residual_entropy_milli(hist: &[u32; 256]) -> u64 {
3461    let n: u32 = hist.iter().sum();
3462    if n == 0 {
3463        return 0;
3464    }
3465    let n_f = n as f64;
3466    let log2_n = n_f.log2();
3467    let mut milli_bits: f64 = 0.0;
3468    for &count in hist.iter() {
3469        if count == 0 {
3470            continue;
3471        }
3472        let c_f = count as f64;
3473        milli_bits += c_f * (log2_n - c_f.log2());
3474    }
3475    (milli_bits * 1000.0 + 0.5) as u64
3476}
3477
3478/// Entropy-cost analogue of [`sweep_cte_axis`]: pick the
3479/// [`CTE_AXIS_CANDIDATES`] entry whose resulting residual histogram
3480/// has the smallest [`channel_residual_entropy_milli`].
3481///
3482/// `residual_of` maps `(candidate, r, g, b)` to the post-transform
3483/// 8-bit residual the candidate produces for the sample, exactly as
3484/// the L1 closures do; here it feeds a histogram rather than a folded
3485/// magnitude. Earliest entry wins ties, matching [`sweep_cte_axis`],
3486/// so the two strategies share a tie-break rule.
3487#[inline]
3488fn sweep_cte_axis_entropy(
3489    samples: &[(u8, u8, u8)],
3490    residual_of: impl Fn(u8, u8, u8, u8) -> u8,
3491) -> u8 {
3492    let mut best: u8 = 0;
3493    let mut best_cost = u64::MAX;
3494    for &cand in &CTE_AXIS_CANDIDATES {
3495        let mut hist = [0u32; 256];
3496        for &(r, g, b) in samples {
3497            hist[residual_of(cand, r, g, b) as usize] += 1;
3498        }
3499        let cost = channel_residual_entropy_milli(&hist);
3500        if cost < best_cost {
3501            best_cost = cost;
3502            best = cand;
3503        }
3504    }
3505    best
3506}
3507
3508/// §3.5.2 entropy-cost color-transform-element chooser — the
3509/// [`ColorTransformStrategy::Entropy`] counterpart of [`pick_block_cte`].
3510///
3511/// Same per-axis greedy and same residual decomposition as the L1
3512/// chooser; only the per-axis scoring is the Shannon histogram bit
3513/// cost. Returns `(green_to_red, green_to_blue, red_to_blue)`.
3514fn pick_block_cte_entropy(
3515    pixels: &[u32],
3516    width: usize,
3517    height: usize,
3518    x0: usize,
3519    y0: usize,
3520    bw: usize,
3521    bh: usize,
3522) -> (u8, u8, u8) {
3523    let mut samples: Vec<(u8, u8, u8)> = Vec::with_capacity(bw * bh);
3524    for dy in 0..bh {
3525        let y = y0 + dy;
3526        if y >= height {
3527            break;
3528        }
3529        for dx in 0..bw {
3530            let x = x0 + dx;
3531            if x >= width {
3532                break;
3533            }
3534            let px = pixels[y * width + x];
3535            let r = ((px >> 16) & 0xff) as u8;
3536            let g = ((px >> 8) & 0xff) as u8;
3537            let b = (px & 0xff) as u8;
3538            samples.push((r, g, b));
3539        }
3540    }
3541    if samples.is_empty() {
3542        return (0, 0, 0);
3543    }
3544
3545    // Axis 1: green → red residual `(red - delta(gtr, green)) & 0xff`.
3546    let best_gtr = sweep_cte_axis_entropy(&samples, |gtr, r, g, _b| {
3547        ((r as i32 - color_xfrm_delta(gtr, g)) & 0xff) as u8
3548    });
3549    // Axis 2: green → blue intermediate residual.
3550    let best_gtb = sweep_cte_axis_entropy(&samples, |gtb, _r, g, b| {
3551        ((b as i32 - color_xfrm_delta(gtb, g)) & 0xff) as u8
3552    });
3553    // Axis 3: red → blue, folding the fixed green→blue delta in first.
3554    let best_rtb = sweep_cte_axis_entropy(&samples, |rtb, r, g, b| {
3555        let inter = b as i32 - color_xfrm_delta(best_gtb, g);
3556        ((inter - color_xfrm_delta(rtb, r)) & 0xff) as u8
3557    });
3558
3559    (best_gtr, best_gtb, best_rtb)
3560}
3561
3562/// Build the §3.5.2 sub-resolution *color image*: one ARGB pixel per
3563/// `(1 << size_bits)`-pixel-square block of the main image, with the
3564/// chosen [`ColorTransformElement`] packed per §3.5.2 ("each
3565/// `ColorTransformElement` 'cte' is treated as a pixel in a
3566/// subresolution image whose alpha component is 255, red component is
3567/// `cte.red_to_blue`, green component is `cte.green_to_blue`, and
3568/// blue component is `cte.green_to_red`").
3569///
3570/// Returns `(color_image, transform_width, transform_height)`. The
3571/// dimensions follow the §4.2 `DIV_ROUND_UP` rule, identical to the
3572/// §4.1 predictor image's.
3573fn build_color_image(
3574    pixels: &[u32],
3575    width: u32,
3576    height: u32,
3577    size_bits: u8,
3578    strategy: ColorTransformStrategy,
3579) -> (Vec<u32>, u32, u32) {
3580    let block = 1u32 << size_bits;
3581    let tw = predictor_div_round_up(width, block);
3582    let th = predictor_div_round_up(height, block);
3583    let mut img = Vec::with_capacity((tw * th) as usize);
3584    let w = width as usize;
3585    let h = height as usize;
3586    let bsz = block as usize;
3587    for by in 0..th as usize {
3588        for bx in 0..tw as usize {
3589            let x0 = bx * bsz;
3590            let y0 = by * bsz;
3591            let (gtr, gtb, rtb) = match strategy {
3592                ColorTransformStrategy::L1 => pick_block_cte(pixels, w, h, x0, y0, bsz, bsz),
3593                ColorTransformStrategy::Entropy => {
3594                    pick_block_cte_entropy(pixels, w, h, x0, y0, bsz, bsz)
3595                }
3596            };
3597            // Pack the CTE into one ARGB pixel exactly as §3.5.2
3598            // specifies: alpha=255, red=red_to_blue, green=green_to_blue,
3599            // blue=green_to_red. The decoder unpacks it in
3600            // `crate::vp8l_transform::inverse_color` via the same
3601            // channel-name mapping.
3602            let argb = 0xff00_0000 | ((rtb as u32) << 16) | ((gtb as u32) << 8) | (gtr as u32);
3603            img.push(argb);
3604        }
3605    }
3606    (img, tw, th)
3607}
3608
3609/// Apply the §3.5.2 *forward* color transform: for each pixel, look up
3610/// the per-block element from `color_image` (with the §3.5.2 channel
3611/// layout) and rewrite the red and blue channels via
3612/// [`forward_color_pixel`]. Green and alpha are passed through.
3613///
3614/// Writes the transformed pixels into `dst` (`width * height` long).
3615/// `src` is the un-modified source; the encoder transforms against the
3616/// originals because the decoder reconstructs identical originals
3617/// channel-by-channel (the inverse adds back the same per-block delta).
3618fn apply_forward_color(
3619    src: &[u32],
3620    dst: &mut [u32],
3621    width: u32,
3622    height: u32,
3623    color_image: &[u32],
3624    transform_width: u32,
3625    size_bits: u8,
3626) {
3627    if width == 0 || height == 0 {
3628        return;
3629    }
3630    let w = width as usize;
3631    let h = height as usize;
3632    for y in 0..h {
3633        for x in 0..w {
3634            let idx = y * w + x;
3635            let bx = (x as u32) >> size_bits;
3636            let by = (y as u32) >> size_bits;
3637            let block_index = (by * transform_width + bx) as usize;
3638            let cte = color_image[block_index];
3639            // §3.5.2 channel mapping: red=red_to_blue, green=green_to_blue,
3640            // blue=green_to_red.
3641            let red_to_blue = ((cte >> 16) & 0xff) as u8;
3642            let green_to_blue = ((cte >> 8) & 0xff) as u8;
3643            let green_to_red = (cte & 0xff) as u8;
3644
3645            let px = src[idx];
3646            let a = ((px >> 24) & 0xff) as u8;
3647            let r = ((px >> 16) & 0xff) as u8;
3648            let g = ((px >> 8) & 0xff) as u8;
3649            let b = (px & 0xff) as u8;
3650            let (new_r, new_b) =
3651                forward_color_pixel(r, g, b, green_to_red, green_to_blue, red_to_blue);
3652            dst[idx] =
3653                ((a as u32) << 24) | ((new_r as u32) << 16) | ((g as u32) << 8) | (new_b as u32);
3654        }
3655    }
3656}
3657
3658/// Default §3.5.2 `size_bits` value the encoder picks for the color
3659/// sub-image: `4` → 16×16 pixel blocks, matching
3660/// [`DEFAULT_PREDICTOR_SIZE_BITS`]. The spec admits `2..=9`
3661/// (`block` sizes 4..=512); finer blocks give better per-block CTE
3662/// fitting at the cost of a larger color sub-image. 16×16 is a
3663/// reasonable middle ground for the typical encoder workloads here.
3664const DEFAULT_COLOR_TRANSFORM_SIZE_BITS: u8 = 4;
3665
3666/// Encode `pixels` taking the §3.5.2 / §4.2 color-transform path: pick
3667/// a per-block `(green_to_red, green_to_blue, red_to_blue)` triple,
3668/// forward-transform the red and blue channels into the per-block
3669/// residuals, then encode the residuals via the standard
3670/// `spatially-coded-image` shape — wrapped by an `optional-transform`
3671/// whose first entry is the §4.2 color transform (header bit `%b1` +
3672/// transform type `Color = 1` + 3-bit `size_bits - 2` + the sub-
3673/// resolution color image as an `entropy-coded-image`).
3674///
3675/// The chooser composes with `cache_code_bits`: when `Some(bits)` a
3676/// §5.2.3 color cache of that size is built over the residual stream's
3677/// literal tokens.
3678///
3679/// **NB:** the color transform requires at least a `1 << size_bits`-
3680/// pixel side on both dimensions so the sub-resolution image has more
3681/// than one block; smaller images fall back to the no-transform
3682/// candidates.
3683fn encode_with_color_transform(
3684    pixels: &[u32],
3685    width: u32,
3686    height: u32,
3687    size_bits: u8,
3688    cache_code_bits: Option<u32>,
3689    image_width: u32,
3690) -> Vec<u8> {
3691    encode_with_color_transform_strategy(
3692        pixels,
3693        width,
3694        height,
3695        size_bits,
3696        cache_code_bits,
3697        image_width,
3698        ColorTransformStrategy::L1,
3699    )
3700}
3701
3702/// `size_bits` + `cache_code_bits` + per-block CTE [`ColorTransformStrategy`]
3703/// variant of [`encode_with_color_transform`]. The chooser sweeps both
3704/// strategies and keeps the byte-shortest stream (round 308), so the
3705/// entropy chooser cannot regress against the L1 baseline. Output is
3706/// round-trip-identical regardless of strategy: the cost model only
3707/// changes which per-block CTE is *recorded*, and the decoder's §4.2
3708/// inverse re-applies whatever CTE the sub-image carries.
3709fn encode_with_color_transform_strategy(
3710    pixels: &[u32],
3711    width: u32,
3712    height: u32,
3713    size_bits: u8,
3714    cache_code_bits: Option<u32>,
3715    image_width: u32,
3716    strategy: ColorTransformStrategy,
3717) -> Vec<u8> {
3718    let mut w = BitWriter::new();
3719
3720    // ---- §3.8.2 / §7.2 optional-transform: color-tx ----
3721    // present bit `%b1`.
3722    w.write_bit(true);
3723    // transform type `Color = 1`, 2 bits.
3724    w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
3725    // 3-bit `size_bits - 2` (decoder adds 2 back per §3.5.2).
3726    debug_assert!((2..=9).contains(&size_bits));
3727    w.write_bits((size_bits - 2) as u32, 3);
3728
3729    // Build the sub-resolution color image then write it as an
3730    // entropy-coded-image per §7.2 `color-image = 3BIT
3731    // entropy-coded-image`.
3732    let (color_image, tw, _th) = build_color_image(pixels, width, height, size_bits, strategy);
3733    write_entropy_coded_image_literals(&mut w, &color_image);
3734
3735    // End of optional-transform list (`%b0`).
3736    w.write_bit(false);
3737
3738    // ---- Forward-transform the main image ----
3739    let mut residuals = vec![0u32; pixels.len()];
3740    apply_forward_color(
3741        pixels,
3742        &mut residuals,
3743        width,
3744        height,
3745        &color_image,
3746        tw,
3747        size_bits,
3748    );
3749
3750    // ---- Tokenise + emit the residual spatially-coded-image ----
3751    let mut tokens = tokenize_lz77(&residuals);
3752    if let Some(bits) = cache_code_bits {
3753        tokens = cacheify_tokens(&tokens, &residuals, bits);
3754    }
3755    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
3756
3757    w.into_bytes()
3758}
3759
3760// ---- §4.4 color-indexing transform encoder --------------------------
3761
3762/// §4.4 upper bound on the color-table size that triggers the
3763/// color-indexing transform: the spec describes the inverse with an
3764/// 8-bit on-wire `color_table_size = ReadBits(8) + 1`, so the legal
3765/// range is `1..=256` unique ARGB colors.
3766const MAX_PALETTE_SIZE: usize = 256;
3767
3768/// Scan `pixels` for unique ARGB values and, if the count is below
3769/// [`MAX_PALETTE_SIZE`], return a `(palette, index_of)` pair:
3770///
3771/// * `palette` — the unique ARGB values, sorted numerically. Sorting
3772///   maximises the per-component delta correlation the §4.4
3773///   subtraction-coded color table feeds to the entropy stage:
3774///   adjacent palette entries share similar ARGB bits, so the deltas
3775///   `palette[i] - palette[i-1]` (per-channel, mod 256) concentrate
3776///   near zero — the histogram shape Huffman codes shrink best.
3777///
3778/// * `index_of` — a lookup map from ARGB pixel value to its position
3779///   in `palette`, used by [`pack_indices_into_bundled_image`] to
3780///   replace each pixel with its index.
3781///
3782/// Returns `None` as soon as the unique-color count exceeds
3783/// [`MAX_PALETTE_SIZE`] (the §4.4 on-wire limit), so the early-exit
3784/// cost on photo-like images is bounded.
3785fn collect_palette(pixels: &[u32]) -> Option<(Vec<u32>, std::collections::HashMap<u32, u32>)> {
3786    use std::collections::HashSet;
3787    let mut set: HashSet<u32> = HashSet::new();
3788    for &p in pixels {
3789        set.insert(p);
3790        if set.len() > MAX_PALETTE_SIZE {
3791            return None;
3792        }
3793    }
3794    let mut palette: Vec<u32> = set.into_iter().collect();
3795    palette.sort_unstable();
3796    let mut map: std::collections::HashMap<u32, u32> =
3797        std::collections::HashMap::with_capacity(palette.len());
3798    for (i, &c) in palette.iter().enumerate() {
3799        map.insert(c, i as u32);
3800    }
3801    Some((palette, map))
3802}
3803
3804/// §4.4 *subtraction-encode* a color table in place — the inverse of
3805/// the decoder's [`crate::vp8l_transform::inverse_color_table`].
3806///
3807/// The decoder reconstructs `color_table[i] = color_table[i-1] +
3808/// color_table[i]` (per-channel mod 256), so the encoder emits
3809/// `color_table[i] - color_table[i-1]` (per-channel mod 256) for
3810/// `i >= 1`, leaving `color_table[0]` unchanged. Deltas walk
3811/// back-to-front so each cell still sees the original (pre-encoded)
3812/// previous value at the moment of subtraction.
3813fn forward_color_table(color_table: &mut [u32]) {
3814    if color_table.len() < 2 {
3815        return;
3816    }
3817    for i in (1..color_table.len()).rev() {
3818        let cur = color_table[i];
3819        let prev = color_table[i - 1];
3820        let a = ((cur >> 24) & 0xff).wrapping_sub((prev >> 24) & 0xff) & 0xff;
3821        let r = ((cur >> 16) & 0xff).wrapping_sub((prev >> 16) & 0xff) & 0xff;
3822        let g = ((cur >> 8) & 0xff).wrapping_sub((prev >> 8) & 0xff) & 0xff;
3823        let b = (cur & 0xff).wrapping_sub(prev & 0xff) & 0xff;
3824        color_table[i] = (a << 24) | (r << 16) | (g << 8) | b;
3825    }
3826}
3827
3828/// §4.4 *forward* pixel bundling: replace each ARGB pixel by its
3829/// palette `index`, packing 1/2/4/8 indices into one byte's-worth of
3830/// green channel per the §4.4 LSB-first packing rule. Other channels
3831/// are zeroed (alpha 0, red 0, blue 0) — the decoder reads only the
3832/// green channel via `inverse_color_indexing`.
3833///
3834/// `width_bits` is the value the shared §4.4 threshold table
3835/// [`crate::vp8l_transform::color_indexing_width_bits`] returns for
3836/// the palette size. `packed_width = DIV_ROUND_UP(width,
3837/// 1 << width_bits)` — the new image width fed to the §3 image
3838/// stream.
3839///
3840/// Returns the `packed_width * height` ARGB buffer the
3841/// `spatially-coded-image` writer feeds to the entropy stage. The
3842/// inverse `inverse_color_indexing` reconstructs the original
3843/// `width * height` ARGB image when given this buffer and the
3844/// (un-subtraction-encoded) palette.
3845fn pack_indices_into_bundled_image(
3846    pixels: &[u32],
3847    index_of: &std::collections::HashMap<u32, u32>,
3848    width: u32,
3849    height: u32,
3850    width_bits: u8,
3851) -> (Vec<u32>, u32) {
3852    let count = 1u32 << width_bits;
3853    let bits_per_index = if width_bits == 0 { 8 } else { 8 / count };
3854    let packed_width = width.div_ceil(count);
3855    let pw = packed_width as usize;
3856    let w = width as usize;
3857    let h = height as usize;
3858    let mut out = vec![0u32; pw * h];
3859    for y in 0..h {
3860        for x in 0..w {
3861            let idx = *index_of
3862                .get(&pixels[y * w + x])
3863                .expect("collect_palette covered every pixel");
3864            let packed_x = x / count as usize;
3865            let sub = x % count as usize;
3866            let shift = sub * bits_per_index as usize;
3867            let bits = (idx & ((1u32 << bits_per_index) - 1)) << shift;
3868            out[y * pw + packed_x] |= bits << 8; // pack into the green channel.
3869        }
3870    }
3871    (out, packed_width)
3872}
3873
3874/// Encode `pixels` taking the §4.4 color-indexing transform path:
3875/// build the unique-color palette, replace every pixel with its
3876/// palette index (bundled per the §4.4 `width_bits` rule when the
3877/// palette has ≤16 entries), then emit the bundled-width image via
3878/// the standard `spatially-coded-image` shape — wrapped by an
3879/// `optional-transform` whose first entry is the §4.4 color-indexing
3880/// transform.
3881///
3882/// Wire format produced (§3.8.2 / §7.2 grammar):
3883///
3884/// ```text
3885/// optional-transform =
3886///   %b1                               -- transform present
3887///   %b11                              -- type ColorIndexing = 3
3888///   8BIT                              -- color_table_size - 1
3889///   entropy-coded-image               -- the subtraction-encoded palette,
3890///                                       written at width = color_table_size,
3891///                                       height = 1
3892///   %b0                               -- end of optional-transform list
3893/// spatially-coded-image               -- packed indices at packed_width
3894/// ```
3895///
3896/// Returns `None` when the palette size exceeds [`MAX_PALETTE_SIZE`]
3897/// (the §4.4 on-wire limit), so the chooser can skip this candidate
3898/// in O(N) on photo-like content. The chooser composes with
3899/// `cache_code_bits`: when `Some(bits)` a §5.2.3 color cache of that
3900/// size is built over the packed-index stream's literal tokens.
3901fn encode_with_color_indexing(
3902    pixels: &[u32],
3903    width: u32,
3904    height: u32,
3905    cache_code_bits: Option<u32>,
3906) -> Option<Vec<u8>> {
3907    let (palette, index_of) = collect_palette(pixels)?;
3908    if palette.is_empty() {
3909        return None;
3910    }
3911
3912    // §4.4 threshold table — single shared copy.
3913    let width_bits = crate::vp8l_transform::color_indexing_width_bits(palette.len());
3914    let (packed_image, packed_width) =
3915        pack_indices_into_bundled_image(pixels, &index_of, width, height, width_bits);
3916
3917    let mut w = BitWriter::new();
3918
3919    // ---- §3.8.2 / §7.2 optional-transform: color-indexing-tx ----
3920    // Header bit `%b1` (transform present).
3921    w.write_bit(true);
3922    // Transform type `ColorIndexing = 3` (2 bits, LSB-first → value 3
3923    // matches the spec's `%b11` MSB-first ABNF when read through
3924    // `ReadBits(2)`).
3925    w.write_bits(crate::vp8l_stream::TransformType::ColorIndexing as u32, 2);
3926    // 8-bit `color_table_size - 1` (decoder adds 1 back per §4.4).
3927    debug_assert!((1..=MAX_PALETTE_SIZE).contains(&palette.len()));
3928    w.write_bits((palette.len() - 1) as u32, 8);
3929
3930    // Color table = an entropy-coded-image at width = color_table_size,
3931    // height = 1. The on-wire palette is subtraction-encoded; the
3932    // decoder applies `inverse_color_table` to reverse it.
3933    let mut subtraction_encoded = palette.clone();
3934    forward_color_table(&mut subtraction_encoded);
3935    write_entropy_coded_image_literals(&mut w, &subtraction_encoded);
3936
3937    // End of optional-transform list (`%b0`).
3938    w.write_bit(false);
3939
3940    // ---- Spatially-coded-image at the *subsampled* width ------------
3941    // After §4.4, `image_width` is `DIV_ROUND_UP(width, 1 <<
3942    // width_bits)`; that is the width the entropy stage threads
3943    // through the §5.2.2 distance-code chooser. Pixel values are the
3944    // packed-green-channel bytes whose red/blue/alpha channels are
3945    // identically zero, so the per-channel Huffman codes for those
3946    // three channels collapse to a 1-symbol prefix code each (almost
3947    // free header overhead).
3948    let mut tokens = tokenize_lz77(&packed_image);
3949    if let Some(bits) = cache_code_bits {
3950        tokens = cacheify_tokens(&tokens, &packed_image, bits);
3951    }
3952    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, packed_width);
3953
3954    Some(w.into_bytes())
3955}
3956
3957/// Encode `pixels` with the §4.4 color-indexing transform **chained**
3958/// with the §4.1 spatial predictor transform on the bundled-index
3959/// image.
3960///
3961/// RFC 9649 §3.5 allows up to four transforms to be stacked in one
3962/// `optional-transform` list (each used at most once); the inverse
3963/// transforms are applied "in the reverse order that they are read
3964/// from the bitstream, that is, last one first." The bundled palette
3965/// indices the §4.4 transform produces live entirely in the green
3966/// channel and run in long spatially-coherent stretches on palette
3967/// content (icons, line art, screen captures); a §4.1 predictor pass
3968/// over that bundled image turns those runs into near-zero residuals,
3969/// shrinking the entropy stage further than either transform alone.
3970///
3971/// ## Wire / inverse ordering
3972///
3973/// The two transforms are written **color-indexing first, predictor
3974/// second**:
3975///
3976/// ```text
3977/// optional-transform =
3978///   %b1 %b11 8BIT entropy-coded-image   -- §4.4 color-indexing-tx (palette)
3979///   %b1 %b00 3BIT entropy-coded-image   -- §4.1 predictor-tx (sub-image)
3980///   %b0                                 -- end of optional-transform list
3981/// spatially-coded-image                 -- predictor residuals over the
3982///                                          packed indices, at packed_width
3983/// ```
3984///
3985/// The decoder reads color-indexing first, which subsamples the width
3986/// it threads into the predictor body (`transform_width =
3987/// DIV_ROUND_UP(packed_width, block)`) and into the main image; it then
3988/// applies the inverses in reverse read order — inverse-predictor over
3989/// the packed-index image first (recovering the bundled indices), then
3990/// inverse-color-indexing (un-bundling back to the full-width ARGB
3991/// pixels). This is exactly the order
3992/// [`crate::vp8l_transform::decode_lossless`] already implements for a
3993/// stacked list, so no decoder change is required.
3994///
3995/// The predictor sub-image is built over the **packed** image at
3996/// `packed_width × height`; the predictor's modes therefore decorrelate
3997/// adjacent bundled-index bytes, not the original pixels.
3998///
3999/// Returns `None` when the palette is infeasible (`> MAX_PALETTE_SIZE`
4000/// unique colors) or the packed image is too small for the predictor
4001/// transform to carry a meaningful body (needs at least one full
4002/// `block × block` square, i.e. `packed_width >= block && height >=
4003/// block`). In those cases the single-transform color-indexing
4004/// candidate already covers the input.
4005///
4006/// The chooser composes with `cache_code_bits` over the residual
4007/// stream's literal tokens, identically to the single-transform paths.
4008fn encode_with_color_indexing_predictor(
4009    pixels: &[u32],
4010    width: u32,
4011    height: u32,
4012    size_bits: u8,
4013    cache_code_bits: Option<u32>,
4014    pred_strategy: PredictorSubImageStrategy,
4015) -> Option<Vec<u8>> {
4016    let (palette, index_of) = collect_palette(pixels)?;
4017    if palette.is_empty() {
4018        return None;
4019    }
4020
4021    // §4.4 bundle the indices into the green channel at the subsampled
4022    // width — the same step the single-transform color-indexing path
4023    // takes.
4024    let width_bits = crate::vp8l_transform::color_indexing_width_bits(palette.len());
4025    let (packed_image, packed_width) =
4026        pack_indices_into_bundled_image(pixels, &index_of, width, height, width_bits);
4027
4028    // The §4.1 predictor needs at least one full block square at the
4029    // packed width; otherwise its sub-image is pure overhead and the
4030    // single-transform color-indexing candidate is strictly cheaper.
4031    let block = 1u32 << size_bits;
4032    if packed_width < block || height < block {
4033        return None;
4034    }
4035
4036    let mut w = BitWriter::new();
4037
4038    // ---- Transform #1 (read first): §4.4 color-indexing-tx ----------
4039    w.write_bit(true);
4040    w.write_bits(crate::vp8l_stream::TransformType::ColorIndexing as u32, 2);
4041    debug_assert!((1..=MAX_PALETTE_SIZE).contains(&palette.len()));
4042    w.write_bits((palette.len() - 1) as u32, 8);
4043    let mut subtraction_encoded = palette.clone();
4044    forward_color_table(&mut subtraction_encoded);
4045    write_entropy_coded_image_literals(&mut w, &subtraction_encoded);
4046
4047    // ---- Transform #2 (read second): §4.1 predictor-tx --------------
4048    // Built over the *packed* index image at `packed_width × height`.
4049    // The decoder will have subsampled `current_width` to `packed_width`
4050    // after reading transform #1, so the `transform_width` it derives
4051    // for this body — `DIV_ROUND_UP(packed_width, block)` — matches the
4052    // `tw` produced here.
4053    w.write_bit(true);
4054    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4055    debug_assert!((2..=9).contains(&size_bits));
4056    w.write_bits((size_bits - 2) as u32, 3);
4057    // Round 305: build the predictor sub-image over the *packed index*
4058    // image under `pred_strategy`. The chooser sweeps L1 / entropy /
4059    // sub-image-aware and keeps the byte-shortest.
4060    let (predictor_image, tw, _th) = build_predictor_image_strategy(
4061        &packed_image,
4062        packed_width,
4063        height,
4064        size_bits,
4065        pred_strategy,
4066    );
4067    write_entropy_coded_image_literals(&mut w, &predictor_image);
4068
4069    // End of optional-transform list (`%b0`).
4070    w.write_bit(false);
4071
4072    // ---- Forward-transform the packed image into residuals ----------
4073    let mut residuals = vec![0u32; packed_image.len()];
4074    apply_forward_predictor(
4075        &packed_image,
4076        &mut residuals,
4077        packed_width,
4078        height,
4079        &predictor_image,
4080        tw,
4081        size_bits,
4082    );
4083
4084    // ---- Spatially-coded-image of the residuals at packed_width -----
4085    let mut tokens = tokenize_lz77(&residuals);
4086    if let Some(bits) = cache_code_bits {
4087        tokens = cacheify_tokens(&tokens, &residuals, bits);
4088    }
4089    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, packed_width);
4090
4091    Some(w.into_bytes())
4092}
4093
4094/// Encode `pixels` with the §4.2 cross-color transform **chained** with
4095/// the §4.1 spatial predictor transform, the stacked pair the spec
4096/// targets at photo / natural-image content.
4097///
4098/// RFC 9649 §3.5 allows up to four transforms to be stacked in one
4099/// `optional-transform` list (each used at most once); the inverse
4100/// transforms are applied "in the reverse order that they are read from
4101/// the bitstream, that is, last one first." On photo content the §4.2
4102/// color transform first removes the inter-channel correlation (rewriting
4103/// red and blue as residuals against green per the per-block
4104/// `ColorTransformElement`); a §4.1 spatial-predictor pass over the
4105/// color-decorrelated image then removes the *spatial* correlation that
4106/// survives in each channel, driving the residuals the entropy stage sees
4107/// closer to zero than either transform alone.
4108///
4109/// ## Wire / inverse ordering
4110///
4111/// The two transforms are written **color-transform first, predictor
4112/// second**:
4113///
4114/// ```text
4115/// optional-transform =
4116///   %b1 %b01 3BIT entropy-coded-image   -- §4.2 color-tx (color sub-image)
4117///   %b1 %b00 3BIT entropy-coded-image   -- §4.1 predictor-tx (sub-image)
4118///   %b0                                 -- end of optional-transform list
4119/// spatially-coded-image                 -- predictor residuals over the
4120///                                          color-transformed image
4121/// ```
4122///
4123/// Neither transform subsamples the width, so both sub-image bodies and
4124/// the main image run at the full canvas `width`. The decoder reads
4125/// color-transform first, predictor second, then applies the inverses in
4126/// reverse read order — inverse-predictor first (recovering the
4127/// color-transformed image), then inverse-color (recovering the original
4128/// ARGB pixels). This is exactly the order
4129/// [`crate::vp8l_transform::decode_lossless`] already implements for a
4130/// stacked list, so no decoder change is required.
4131///
4132/// The predictor sub-image is built over the **color-transformed** image,
4133/// so its per-block modes decorrelate the color residuals, not the raw
4134/// pixels.
4135///
4136/// `size_bits` is shared by both transforms (each writes its own 3-bit
4137/// `size_bits - 2` header). The caller is responsible for gating on
4138/// `width >= block && height >= block` so both sub-images carry at least
4139/// one full block square; the chooser does this before calling.
4140fn encode_with_color_transform_predictor(
4141    pixels: &[u32],
4142    width: u32,
4143    height: u32,
4144    size_bits: u8,
4145    cache_code_bits: Option<u32>,
4146    pred_strategy: PredictorSubImageStrategy,
4147) -> Vec<u8> {
4148    let mut w = BitWriter::new();
4149
4150    // ---- Transform #1 (read first): §4.2 color-tx -------------------
4151    w.write_bit(true);
4152    w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
4153    debug_assert!((2..=9).contains(&size_bits));
4154    w.write_bits((size_bits - 2) as u32, 3);
4155    let (color_image, ctw, _cth) =
4156        build_color_image(pixels, width, height, size_bits, ColorTransformStrategy::L1);
4157    write_entropy_coded_image_literals(&mut w, &color_image);
4158
4159    // Forward the §4.2 color transform over the originals so the
4160    // predictor below sees the color-decorrelated image.
4161    let mut color_transformed = vec![0u32; pixels.len()];
4162    apply_forward_color(
4163        pixels,
4164        &mut color_transformed,
4165        width,
4166        height,
4167        &color_image,
4168        ctw,
4169        size_bits,
4170    );
4171
4172    // ---- Transform #2 (read second): §4.1 predictor-tx --------------
4173    // Built over the color-transformed image at full `width × height`.
4174    // The decoder will have left `current_width` at `width` after the
4175    // color transform (color-tx does not subsample), so the
4176    // `transform_width` it derives matches `ptw` here. The predictor
4177    // sub-image is built under `pred_strategy` (round 305) — the
4178    // chooser sweeps L1 / entropy / sub-image-aware over this
4179    // color-decorrelated residual and keeps the byte-shortest.
4180    w.write_bit(true);
4181    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4182    w.write_bits((size_bits - 2) as u32, 3);
4183    let (predictor_image, ptw, _pth) =
4184        build_predictor_image_strategy(&color_transformed, width, height, size_bits, pred_strategy);
4185    write_entropy_coded_image_literals(&mut w, &predictor_image);
4186
4187    // End of optional-transform list (`%b0`).
4188    w.write_bit(false);
4189
4190    // ---- Forward-transform the color-transformed image into residuals
4191    let mut residuals = vec![0u32; color_transformed.len()];
4192    apply_forward_predictor(
4193        &color_transformed,
4194        &mut residuals,
4195        width,
4196        height,
4197        &predictor_image,
4198        ptw,
4199        size_bits,
4200    );
4201
4202    // ---- Spatially-coded-image of the residuals at full width -------
4203    let mut tokens = tokenize_lz77(&residuals);
4204    if let Some(bits) = cache_code_bits {
4205        tokens = cacheify_tokens(&tokens, &residuals, bits);
4206    }
4207    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, width);
4208
4209    w.into_bytes()
4210}
4211
4212/// Encode `pixels` with a **three-transform** §3.5 stack: §4.2 cross-color
4213/// (read first) → §4.3 subtract-green (read second) → §4.1 spatial
4214/// predictor (read third), chained over one `optional-transform` list.
4215///
4216/// RFC 9649 §3.5 permits up to four transforms stacked in one list, each
4217/// used at most once, with the inverses applied "in the reverse order that
4218/// they are read from the bitstream, that is, last one first." This
4219/// candidate is the natural three-axis extension of the round-303
4220/// color-transform + predictor pair: after the §4.2 per-block color
4221/// transform has removed the *modeled* inter-channel correlation (rewriting
4222/// red / blue as residuals against green per the per-block
4223/// `ColorTransformElement`), a header-free §4.3 subtract-green pass removes
4224/// the *uniform* red/blue-vs-green correlation that survives the per-block
4225/// model (the CTE multipliers are coarse 3.5-bit fixed-point values, so a
4226/// residual green-correlated component routinely remains), and a §4.1
4227/// predictor pass then removes the spatial correlation left in each channel.
4228/// The entropy stage therefore sees residuals driven closer to zero than any
4229/// one- or two-transform path achieves alone, on content where all three
4230/// correlation axes carry mass.
4231///
4232/// ## Wire / inverse ordering
4233///
4234/// ```text
4235/// optional-transform =
4236///   %b1 %b01 3BIT entropy-coded-image   -- §4.2 color-tx (color sub-image)
4237///   %b1 %b10                            -- §4.3 subtract-green (no data)
4238///   %b1 %b00 3BIT entropy-coded-image   -- §4.1 predictor-tx (sub-image)
4239///   %b0                                 -- end of optional-transform list
4240/// spatially-coded-image                 -- predictor residuals over the
4241///                                          color- + subtract-green-
4242///                                          transformed image
4243/// ```
4244///
4245/// The §4.3 subtract-green transform carries **no** transform data (just
4246/// the `%b1 %b10` presence + type bits), exactly as §4.3 specifies. None of
4247/// the three transforms subsamples the width, so both sub-image bodies and
4248/// the main image run at the full canvas `width`. The decoder reads color
4249/// first, subtract-green second, predictor third, then applies the inverses
4250/// last-read-first — inverse-predictor (recovering the color- +
4251/// subtract-green-transformed image), inverse-subtract-green (recovering the
4252/// color-transformed image), inverse-color (recovering the originals). This
4253/// is exactly the generic reverse-read-order chain
4254/// [`crate::vp8l_transform::decode_lossless`] already applies, so no decoder
4255/// change is required.
4256///
4257/// The predictor sub-image is built over the **color- + subtract-green-
4258/// transformed** image, so its per-block modes decorrelate that residual,
4259/// not the raw pixels.
4260///
4261/// `size_bits` is shared by the §4.2 color and §4.1 predictor transforms
4262/// (the §4.3 subtract-green transform has no `size_bits`); each writes its
4263/// own 3-bit `size_bits - 2` header. The caller gates on
4264/// `width >= block && height >= block` so both sub-images carry at least one
4265/// full block square.
4266fn encode_with_color_transform_subtract_green_predictor(
4267    pixels: &[u32],
4268    width: u32,
4269    height: u32,
4270    size_bits: u8,
4271    cache_code_bits: Option<u32>,
4272    pred_strategy: PredictorSubImageStrategy,
4273) -> Vec<u8> {
4274    let mut w = BitWriter::new();
4275
4276    // ---- Transform #1 (read first): §4.2 color-tx -------------------
4277    w.write_bit(true);
4278    w.write_bits(crate::vp8l_stream::TransformType::Color as u32, 2);
4279    debug_assert!((2..=9).contains(&size_bits));
4280    w.write_bits((size_bits - 2) as u32, 3);
4281    let (color_image, ctw, _cth) =
4282        build_color_image(pixels, width, height, size_bits, ColorTransformStrategy::L1);
4283    write_entropy_coded_image_literals(&mut w, &color_image);
4284
4285    // Forward the §4.2 color transform over the originals.
4286    let mut transformed = vec![0u32; pixels.len()];
4287    apply_forward_color(
4288        pixels,
4289        &mut transformed,
4290        width,
4291        height,
4292        &color_image,
4293        ctw,
4294        size_bits,
4295    );
4296
4297    // ---- Transform #2 (read second): §4.3 subtract-green ------------
4298    // Header-free: just the presence bit + the 2-bit type. The forward
4299    // pass rewrites red/blue against green in place over the
4300    // color-transformed image.
4301    w.write_bit(true);
4302    w.write_bits(crate::vp8l_stream::TransformType::SubtractGreen as u32, 2);
4303    apply_subtract_green(&mut transformed);
4304
4305    // ---- Transform #3 (read third): §4.1 predictor-tx --------------
4306    // Built over the color- + subtract-green-transformed image at full
4307    // `width × height`. Neither earlier transform subsampled the width,
4308    // so the decoder still has `current_width == width` here and the
4309    // `transform_width` it derives matches `ptw`. The predictor
4310    // sub-image is built under `pred_strategy` (round 305).
4311    w.write_bit(true);
4312    w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
4313    w.write_bits((size_bits - 2) as u32, 3);
4314    let (predictor_image, ptw, _pth) =
4315        build_predictor_image_strategy(&transformed, width, height, size_bits, pred_strategy);
4316    write_entropy_coded_image_literals(&mut w, &predictor_image);
4317
4318    // End of optional-transform list (`%b0`).
4319    w.write_bit(false);
4320
4321    // ---- Forward-transform the transformed image into residuals -----
4322    let mut residuals = vec![0u32; transformed.len()];
4323    apply_forward_predictor(
4324        &transformed,
4325        &mut residuals,
4326        width,
4327        height,
4328        &predictor_image,
4329        ptw,
4330        size_bits,
4331    );
4332
4333    // ---- Spatially-coded-image of the residuals at full width -------
4334    let mut tokens = tokenize_lz77(&residuals);
4335    if let Some(bits) = cache_code_bits {
4336        tokens = cacheify_tokens(&tokens, &residuals, bits);
4337    }
4338    write_spatially_coded_image(&mut w, &tokens, cache_code_bits, width);
4339
4340    w.into_bytes()
4341}
4342
4343// ---- §6.2.2 multi-meta-prefix (entropy-image) encoder ----------------
4344
4345/// Default `prefix_bits` candidate the §6.2.2 multi-meta-prefix
4346/// chooser sweeps. Each value gives a block side of `1 << prefix_bits`
4347/// pixels — larger blocks mean fewer of them (cheap entropy image,
4348/// fewer prefix-code groups) but coarser per-region adaptation; smaller
4349/// blocks mean finer adaptation but a larger entropy-image overhead.
4350/// The sweep across `[4, 5, 6, 7]` gives 16/32/64/128-pixel blocks,
4351/// which span the useful range for the dimensions this crate targets
4352/// (typical lossless WebP fixtures are 16..512 pixels per side).
4353///
4354/// The spec admits `prefix_bits ∈ [2..9]` (i.e. 4..512-pixel blocks);
4355/// the chooser narrows that to four values rather than the full eight
4356/// because the very smallest (4-pixel) blocks rarely beat the
4357/// single-group baseline (the entropy image grows quadratically with
4358/// `1 / block_side`) and the largest (256/512-pixel) blocks are
4359/// useless on the smaller images this candidate targets.
4360const META_PREFIX_BITS_SWEEP: [u8; 4] = [4, 5, 6, 7];
4361
4362/// Largest number of prefix-code groups the §6.2.2 chooser will form.
4363/// Each group costs five additional code-length tables in the stream
4364/// header (~30..120 bits per code), so the chooser only pays the
4365/// overhead when the per-group savings on the LZ77 stream beat the
4366/// header cost. Capping at 4 keeps the chooser's wall-time bounded
4367/// while covering the per-region adaptation that pays for itself on
4368/// natural images (where the per-quadrant statistics diverge enough to
4369/// justify separate codes).
4370const MAX_META_GROUPS: u32 = 4;
4371
4372// ---- §6.2.2 histogram-distance block clusterer -------------------------
4373//
4374// Spec context (RFC 9649 §3.7.2.2 / WebP Lossless §6.2.2): the §5.2 LZ77
4375// + prefix-code-group decoder selects one of `num_prefix_groups` groups
4376// per pixel block. The encoder gets to choose how to *partition* the
4377// image's blocks into groups — the spec only constrains the on-wire
4378// representation (an `entropy-coded-image` whose green+red channels
4379// carry the per-block meta-prefix code).
4380//
4381// The right partition collects blocks whose alphabet-symbol histograms
4382// (green, red, blue, alpha + LZ77 length / distance) match closely, so
4383// each group's shared §6.2 prefix code can compact those symbols
4384// efficiently. A direct symbol-histogram clusterer would have to
4385// pre-tokenise to see which symbols each block produces, which puts a
4386// hard constraint on the matcher (`tokenize_lz77` runs *after* the
4387// clusterer here). We use a pixel-domain proxy instead: a coarse
4388// per-channel RGB histogram. Blocks whose pixel-value distributions
4389// agree at bin resolution will, in expectation, produce closely-matched
4390// literal-symbol frequencies, which is exactly what drives §6.2's
4391// per-group code cost.
4392
4393/// Bin shift collapsing the 256-value channel range into a coarser
4394/// histogram for clustering. `BIN_SHIFT = 4` → 16 bins per channel.
4395///
4396/// The smaller the shift the finer the discrimination but the more
4397/// per-block memory + per-iteration arithmetic; 4 keeps the per-block
4398/// feature vector at 48 `u32` slots (16 × 3 channels) which is small
4399/// enough to scan repeatedly in Lloyd's iteration but large enough to
4400/// distinguish meaningfully different per-region distributions on
4401/// natural-image inputs.
4402const CLUSTER_BIN_SHIFT: u32 = 4;
4403/// Number of histogram bins per channel after [`CLUSTER_BIN_SHIFT`]:
4404/// `256 >> CLUSTER_BIN_SHIFT`.
4405const CLUSTER_BINS_PER_CHANNEL: usize = 256 >> CLUSTER_BIN_SHIFT;
4406/// Channels included in the feature vector. We histogram red / green /
4407/// blue; alpha is omitted because most lossless WebP payloads carry an
4408/// opaque alpha and a uniform-`0xff` alpha bin contributes no signal.
4409const CLUSTER_NUM_CHANNELS: usize = 3;
4410/// Length of one block's feature vector: `bins-per-channel × channels`.
4411const CLUSTER_FEATURE_DIM: usize = CLUSTER_BINS_PER_CHANNEL * CLUSTER_NUM_CHANNELS;
4412
4413/// Maximum Lloyd's-algorithm iteration count. On the diagnostic
4414/// fixtures the assignment settles in 2–3 passes; the cap bounds the
4415/// chooser's wall-time on pathological inputs (the outer chooser will
4416/// often discard this candidate anyway).
4417const CLUSTER_MAX_ITERATIONS: u32 = 8;
4418
4419/// Build the per-block coarse RGB histogram feature vectors.
4420///
4421/// The feature layout per block is three contiguous channel chunks:
4422/// red bins, then green bins, then blue bins, each of length
4423/// [`CLUSTER_BINS_PER_CHANNEL`]. Counts are left raw (not normalised)
4424/// because all blocks of the same `block_side` see the same pixel
4425/// count, so L1 distance between any two block vectors is directly
4426/// comparable. Boundary blocks (where `block_side` doesn't divide
4427/// `width` / `height` evenly) have smaller pixel counts, so their
4428/// vector magnitudes are correspondingly smaller — the L1 metric
4429/// stays meaningful because both sides of every comparison are
4430/// pulled from the same fixed-size bin grid.
4431fn histogram_block_features(
4432    pixels: &[u32],
4433    width: u32,
4434    height: u32,
4435    prefix_bits: u8,
4436) -> (Vec<u32>, usize) {
4437    let block_side = 1u32 << prefix_bits;
4438    let blocks_wide = width.div_ceil(block_side) as usize;
4439    let blocks_high = height.div_ceil(block_side) as usize;
4440    let block_count = blocks_wide * blocks_high;
4441    let mut features = vec![0u32; block_count * CLUSTER_FEATURE_DIM];
4442
4443    let row_stride = width as usize;
4444    let bs = block_side as usize;
4445    for y in 0..height as usize {
4446        let block_row = y / bs;
4447        for x in 0..width as usize {
4448            let block_col = x / bs;
4449            let block_index = block_row * blocks_wide + block_col;
4450            let pixel = pixels[y * row_stride + x];
4451            let r_bin = (((pixel >> 16) & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4452            let g_bin = (((pixel >> 8) & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4453            let b_bin = ((pixel & 0xff) >> CLUSTER_BIN_SHIFT) as usize;
4454            let base = block_index * CLUSTER_FEATURE_DIM;
4455            features[base + r_bin] += 1;
4456            features[base + CLUSTER_BINS_PER_CHANNEL + g_bin] += 1;
4457            features[base + 2 * CLUSTER_BINS_PER_CHANNEL + b_bin] += 1;
4458        }
4459    }
4460    (features, block_count)
4461}
4462
4463/// L1 (sum-of-absolute-differences) distance between two
4464/// `CLUSTER_FEATURE_DIM`-length count vectors. Symmetric and integer-
4465/// valued; zero iff every bin matches exactly.
4466fn histogram_l1(a: &[u32], b: &[u32]) -> u64 {
4467    debug_assert_eq!(a.len(), CLUSTER_FEATURE_DIM);
4468    debug_assert_eq!(b.len(), CLUSTER_FEATURE_DIM);
4469    let mut sum: u64 = 0;
4470    for i in 0..CLUSTER_FEATURE_DIM {
4471        let ai = a[i];
4472        let bi = b[i];
4473        sum += ai.abs_diff(bi) as u64;
4474    }
4475    sum
4476}
4477
4478/// Deterministic centroid seeding by farthest-from-already-chosen rule
4479/// (a k-means++-style maximum-minimum-distance variant with no
4480/// randomness so identical inputs always produce identical seeds).
4481///
4482/// Starts with block 0 as the first centroid, then repeatedly picks
4483/// the block whose minimum L1 distance to the already-chosen set is
4484/// the largest. Returns the chosen block indices. If at some step no
4485/// remaining block has positive distance to every chosen centroid
4486/// (i.e. it duplicates one already in the set), the seeding stops
4487/// early — the caller treats a list shorter than `num_groups` as a
4488/// signal that the input cannot be split that finely.
4489fn seed_cluster_centroids(features: &[u32], block_count: usize, num_groups: u32) -> Vec<usize> {
4490    let target = num_groups as usize;
4491    debug_assert!(target >= 1 && target <= block_count);
4492    let mut picks: Vec<usize> = Vec::with_capacity(target);
4493    picks.push(0);
4494    while picks.len() < target {
4495        let mut champion_block = 0usize;
4496        let mut champion_min_dist: u64 = 0;
4497        for cand in 0..block_count {
4498            if picks.contains(&cand) {
4499                continue;
4500            }
4501            let cand_vec = &features[cand * CLUSTER_FEATURE_DIM..(cand + 1) * CLUSTER_FEATURE_DIM];
4502            let mut nearest: u64 = u64::MAX;
4503            for &p in &picks {
4504                let pick_vec = &features[p * CLUSTER_FEATURE_DIM..(p + 1) * CLUSTER_FEATURE_DIM];
4505                let d = histogram_l1(cand_vec, pick_vec);
4506                if d < nearest {
4507                    nearest = d;
4508                }
4509            }
4510            if nearest > champion_min_dist {
4511                champion_min_dist = nearest;
4512                champion_block = cand;
4513            }
4514        }
4515        if champion_min_dist == 0 {
4516            // No more distinguishable centroids remain.
4517            break;
4518        }
4519        picks.push(champion_block);
4520    }
4521    picks
4522}
4523
4524/// Partition the image's `prefix_bits`-aligned blocks into at most
4525/// `num_groups` clusters by coarse-RGB-histogram L1 distance, returning
4526/// one meta-prefix code per block in scan-line order.
4527///
4528/// The returned codes are always *compact*: they form the contiguous
4529/// range `0..actual_groups - 1` with no gaps. Per RFC 9649 §3.7.2.2.2
4530/// the entropy image's `num_prefix_groups` is derived as
4531/// `max(entropy image) + 1`, so a gap (an empty group sitting between
4532/// used ones) would force the encoder to emit an unused prefix-code
4533/// group and pay its code-length-table cost for no benefit.
4534///
4535/// Returns `vec![0; block_count]` (a single-group degenerate) when:
4536///
4537/// * `num_groups == 1` (caller asked for one group),
4538/// * `block_count <= 1` (the entropy image holds at most one block, so
4539///   there is no partition to make),
4540/// * seeding cannot find `≥ 2` distinguishable centroids (e.g. all
4541///   blocks have identical histograms), or
4542/// * Lloyd's iteration converges to a single non-empty cluster after
4543///   the compaction pass.
4544///
4545/// The caller's chooser uses the degenerate path as a signal to fall
4546/// through to the single-group baseline rather than paying the
4547/// multi-group meta-prefix header overhead.
4548///
4549/// **Determinism.** Two calls with the same `(pixels, width, height,
4550/// prefix_bits, num_groups)` always produce the same `Vec<u16>` — the
4551/// seeding rule, the Lloyd loop's tie-break (lowest-index centroid
4552/// wins on equal-distance), and the compaction pass are all
4553/// deterministic.
4554///
4555/// Exposed `pub` (like [`pick_block_cte`]) so the `meta_prefix_cluster`
4556/// criterion bench can drive this §6.2.2 entropy-image kernel in
4557/// isolation. Not part of the crate's documented stable surface.
4558pub fn cluster_blocks_by_histogram_distance(
4559    pixels: &[u32],
4560    width: u32,
4561    height: u32,
4562    prefix_bits: u8,
4563    num_groups: u32,
4564) -> Vec<u16> {
4565    debug_assert!(num_groups >= 1);
4566    let (features, block_count) = histogram_block_features(pixels, width, height, prefix_bits);
4567    if num_groups == 1 || block_count <= 1 {
4568        return vec![0u16; block_count];
4569    }
4570
4571    let seeds = seed_cluster_centroids(&features, block_count, num_groups);
4572    if seeds.len() < 2 {
4573        return vec![0u16; block_count];
4574    }
4575    let cluster_k = seeds.len();
4576
4577    // Centroids are stored as running sums of assigned-block feature
4578    // vectors so the update step amortises the per-bin sum across all
4579    // assigned blocks in O(block_count × feat_dim). The per-cluster
4580    // assignment count divides the sum on demand to materialise the
4581    // average for the L1 step.
4582    let mut centroid_sums: Vec<u64> = vec![0u64; cluster_k * CLUSTER_FEATURE_DIM];
4583    let mut centroid_counts: Vec<u64> = vec![1u64; cluster_k];
4584    for (slot, &block_idx) in seeds.iter().enumerate() {
4585        let src = &features[block_idx * CLUSTER_FEATURE_DIM..(block_idx + 1) * CLUSTER_FEATURE_DIM];
4586        for (i, &v) in src.iter().enumerate() {
4587            centroid_sums[slot * CLUSTER_FEATURE_DIM + i] = v as u64;
4588        }
4589    }
4590
4591    let mut assignment: Vec<u16> = vec![0u16; block_count];
4592    let mut centroid_view: Vec<u32> = vec![0u32; CLUSTER_FEATURE_DIM];
4593
4594    for _pass in 0..CLUSTER_MAX_ITERATIONS {
4595        // Assignment step: reassign each block to the nearest centroid.
4596        let mut any_change = false;
4597        for b in 0..block_count {
4598            let block_vec = &features[b * CLUSTER_FEATURE_DIM..(b + 1) * CLUSTER_FEATURE_DIM];
4599            let mut best_group: u16 = 0;
4600            let mut best_dist: u64 = u64::MAX;
4601            for ci in 0..cluster_k {
4602                let divisor = centroid_counts[ci].max(1);
4603                for i in 0..CLUSTER_FEATURE_DIM {
4604                    let raw = centroid_sums[ci * CLUSTER_FEATURE_DIM + i];
4605                    centroid_view[i] = (raw / divisor) as u32;
4606                }
4607                let d = histogram_l1(block_vec, &centroid_view);
4608                if d < best_dist {
4609                    best_dist = d;
4610                    best_group = ci as u16;
4611                }
4612            }
4613            if assignment[b] != best_group {
4614                assignment[b] = best_group;
4615                any_change = true;
4616            }
4617        }
4618        if !any_change {
4619            break;
4620        }
4621
4622        // Update step: rebuild centroid sums + counts from the new
4623        // assignment.
4624        for slot in centroid_sums.iter_mut() {
4625            *slot = 0;
4626        }
4627        for slot in centroid_counts.iter_mut() {
4628            *slot = 0;
4629        }
4630        for b in 0..block_count {
4631            let ci = assignment[b] as usize;
4632            let block_vec = &features[b * CLUSTER_FEATURE_DIM..(b + 1) * CLUSTER_FEATURE_DIM];
4633            let base = ci * CLUSTER_FEATURE_DIM;
4634            for (i, &v) in block_vec.iter().enumerate() {
4635                centroid_sums[base + i] += v as u64;
4636            }
4637            centroid_counts[ci] += 1;
4638        }
4639    }
4640
4641    // Compaction: map the (possibly sparse) assigned group IDs onto
4642    // the contiguous range `0..used - 1`. First-seen-in-scan-order
4643    // wins, so the output is deterministic.
4644    let mut remap: Vec<i32> = vec![-1; cluster_k];
4645    let mut next_id: u16 = 0;
4646    for slot in assignment.iter_mut() {
4647        let group = *slot as usize;
4648        if remap[group] < 0 {
4649            remap[group] = next_id as i32;
4650            next_id += 1;
4651        }
4652        *slot = remap[group] as u16;
4653    }
4654    if next_id < 2 {
4655        return vec![0u16; block_count];
4656    }
4657    assignment
4658}
4659
4660/// §6.2.2 per-pixel group selector backed by a flat block-index map.
4661/// Mirrors the decoder's [`crate::vp8l_decode::MetaPrefixIndex`] but
4662/// owns its data so the encoder can build/inspect it without going
4663/// through the decoder type.
4664struct EncoderMetaIndex {
4665    prefix_bits: u8,
4666    block_width: u32,
4667    /// Per-block meta-prefix code in scan-line order, `block_width *
4668    /// block_height` entries.
4669    codes: Vec<u16>,
4670}
4671
4672impl EncoderMetaIndex {
4673    /// §6.2.2 group selection for pixel `(x, y)`:
4674    /// `codes[(y >> prefix_bits) * block_width + (x >> prefix_bits)]`.
4675    fn group_for(&self, x: u32, y: u32) -> u16 {
4676        let bx = x >> self.prefix_bits;
4677        let by = y >> self.prefix_bits;
4678        self.codes[(by * self.block_width + bx) as usize]
4679    }
4680
4681    /// §6.2.2 `num_prefix_groups = max(entropy image) + 1`.
4682    fn num_groups(&self) -> u32 {
4683        self.codes
4684            .iter()
4685            .copied()
4686            .max()
4687            .map(|c| c as u32 + 1)
4688            .unwrap_or(1)
4689    }
4690
4691    /// Build the entropy-image ARGB pixel buffer the §6.2.2 entropy
4692    /// image is decoded from. Per §6.2.2, the meta-prefix code is the
4693    /// red+green channels of the entropy pixel: `(meta_code >> 8) &
4694    /// 0xffff` — i.e. the low 8 bits of `meta_code` go into the green
4695    /// channel and the next 8 bits into the red channel. Other channels
4696    /// (alpha, blue) are zero.
4697    fn entropy_image_argb(&self) -> Vec<u32> {
4698        self.codes
4699            .iter()
4700            .map(|&c| {
4701                let lo = (c & 0xff) as u32; // green
4702                let hi = ((c >> 8) & 0xff) as u32; // red
4703                (hi << 16) | (lo << 8)
4704            })
4705            .collect()
4706    }
4707}
4708
4709/// Split `tokens` into one bucket per group. The LZ77 token stream was
4710/// generated globally over the whole image, so each token's group is
4711/// determined by the position of the *first* pixel it emits — for a
4712/// `Literal` / `CacheRef` that's a single-pixel position; for a
4713/// `Copy { length, distance }` it's the position of the copy's *start*
4714/// pixel. The §6.2.3 decode loop selects the group per *symbol*, so we
4715/// emit each token's symbols entirely under that single group's prefix
4716/// codes (matching the decoder's group-per-symbol contract, which is
4717/// also group-per-token because each token contributes one indexed
4718/// position via the next-undefined-pixel cursor).
4719///
4720/// Returns a `(group_token_lists, group_pixel_positions)` pair where
4721/// `group_token_lists[i]` is the ordered tokens belonging to group `i`
4722/// and `group_pixel_positions[i]` is the parallel list of starting
4723/// pixel positions (used as a sanity check during `count_frequencies`).
4724fn split_tokens_by_group(
4725    tokens: &[Token],
4726    index: &EncoderMetaIndex,
4727    width: u32,
4728    num_groups: u32,
4729) -> Vec<Vec<Token>> {
4730    let mut buckets: Vec<Vec<Token>> = vec![Vec::new(); num_groups as usize];
4731    let mut pos = 0usize;
4732    let w = width as usize;
4733    for &tok in tokens {
4734        let x = (pos % w) as u32;
4735        let y = (pos / w) as u32;
4736        let g = index.group_for(x, y) as usize;
4737        debug_assert!(g < buckets.len());
4738        buckets[g].push(tok);
4739        let consumed = match tok {
4740            Token::Literal(_) | Token::CacheRef { .. } => 1usize,
4741            Token::Copy { length, .. } => length,
4742        };
4743        pos += consumed;
4744    }
4745    buckets
4746}
4747
4748/// Build the encoder-side per-group [`WriteCode`] tables: for each
4749/// group, count its token-bucket frequencies and Huffman-build the
4750/// five §6.2 prefix codes. The GREEN alphabet size is the same across
4751/// groups (`256 + 24 + color_cache_size`) so the on-wire prefix code
4752/// layouts are uniformly sized; the per-group frequency *distributions*
4753/// differ, which is exactly the point — each group gets a code tailored
4754/// to the bucket it represents.
4755///
4756/// Empty-bucket handling: when a group's bucket has zero tokens (the
4757/// clusterer assigned a block group_id that ends up unused after the
4758/// LZ77 matcher's emission cursor walked past it), every per-channel
4759/// frequency table is all-zero. The standard `WriteCode::from_freqs`
4760/// would yield an incomplete (Kraft-sum-zero) code the decoder
4761/// rejects with §6.2.1's "incomplete" error. We mirror
4762/// `write_prefix_codes_and_tokens`'s empty-distance handling for every
4763/// channel in that degenerate case: emit the §3.7.2.1.1 single-symbol-0
4764/// form, which decodes to a valid (one-leaf) code the bucket will
4765/// never actually exercise.
4766fn build_group_codes(
4767    buckets: &[Vec<Token>],
4768    color_cache_size: usize,
4769    image_width: u32,
4770) -> Vec<[WriteCode; 5]> {
4771    let green_alphabet = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + color_cache_size;
4772    buckets
4773        .iter()
4774        .map(|bucket| {
4775            let freqs = count_frequencies(bucket, color_cache_size, image_width);
4776            // `empty(N)` produces a valid one-leaf code over an
4777            // alphabet of size `N` (the §3.7.2.1.1 single-symbol-0
4778            // form). For each channel, fall back to it when no
4779            // symbols were emitted in this bucket — the decoder
4780            // accepts the resulting one-leaf code without ever
4781            // consuming a symbol from it.
4782            let green = if freqs.green.iter().any(|&f| f > 0) {
4783                WriteCode::from_freqs(&freqs.green)
4784            } else {
4785                WriteCode::empty(green_alphabet)
4786            };
4787            let red = if freqs.red.iter().any(|&f| f > 0) {
4788                WriteCode::from_freqs(&freqs.red)
4789            } else {
4790                WriteCode::empty(256)
4791            };
4792            let blue = if freqs.blue.iter().any(|&f| f > 0) {
4793                WriteCode::from_freqs(&freqs.blue)
4794            } else {
4795                WriteCode::empty(256)
4796            };
4797            let alpha = if freqs.alpha.iter().any(|&f| f > 0) {
4798                WriteCode::from_freqs(&freqs.alpha)
4799            } else {
4800                WriteCode::empty(256)
4801            };
4802            let dist = if freqs.distance.iter().any(|&f| f > 0) {
4803                WriteCode::from_freqs(&freqs.distance)
4804            } else {
4805                WriteCode::empty(40)
4806            };
4807            [green, red, blue, alpha, dist]
4808        })
4809        .collect()
4810}
4811
4812/// Try encoding `pixels` with the §6.2.2 multi-meta-prefix path:
4813///
4814/// 1. Cluster the image's `prefix_bits`-aligned blocks into `num_groups`
4815///    groups by coarse-RGB-histogram L1 distance (see
4816///    [`cluster_blocks_by_histogram_distance`]). Blocks whose pixel-
4817///    value distributions agree at bin resolution end up in the same
4818///    group and share a single five-code prefix-code group.
4819/// 2. Tokenise the image via the standard §5.2.2 LZ77 matcher
4820///    (`tokenize_lz77`), optionally cacheifying with `cache_code_bits`.
4821/// 3. Split tokens into per-group buckets, build per-group prefix codes,
4822///    and emit the §3.8.3 image data with:
4823///      * `%b0` (no §3.8.2 transforms in this candidate),
4824///      * `color-cache-info` (`%b0` or `%b1 4BIT`),
4825///      * `meta-prefix = %b1` + 3-bit `prefix_bits - 2`,
4826///      * the entropy image as an `entropy-coded-image` body via
4827///        [`write_entropy_coded_image_literals`],
4828///      * `num_groups` prefix-code groups (5 prefix codes each),
4829///      * the LZ77 token stream emitted with the group selected per
4830///        pixel block.
4831///
4832/// Returns `None` when the candidate is degenerate (image too small
4833/// for the requested block side; clustering collapsed to one group).
4834/// The chooser must fall back to the single-group path in those cases.
4835fn encode_with_meta_prefix(
4836    pixels: &[u32],
4837    width: u32,
4838    height: u32,
4839    prefix_bits: u8,
4840    num_groups: u32,
4841    cache_code_bits: Option<u32>,
4842    image_width: u32,
4843) -> Option<Vec<u8>> {
4844    debug_assert!((2..=9).contains(&prefix_bits));
4845    debug_assert!((1..=MAX_META_GROUPS).contains(&num_groups));
4846
4847    let block_side = 1u32 << prefix_bits;
4848    // The §6.2.2 entropy image is `DIV_ROUND_UP(image_width, block_side)`
4849    // × `DIV_ROUND_UP(image_height, block_side)`. We need at least two
4850    // blocks for a multi-group split to be possible.
4851    let pw = width.div_ceil(block_side);
4852    let ph = height.div_ceil(block_side);
4853    if (pw * ph) < num_groups {
4854        return None;
4855    }
4856
4857    let codes =
4858        cluster_blocks_by_histogram_distance(pixels, width, height, prefix_bits, num_groups);
4859    let index = EncoderMetaIndex {
4860        prefix_bits,
4861        block_width: pw,
4862        codes,
4863    };
4864    let actual_groups = index.num_groups();
4865    if actual_groups < 2 {
4866        // Clustering collapsed — no point paying the meta-prefix overhead.
4867        return None;
4868    }
4869
4870    // Build the LZ77 token stream globally (matches the
4871    // single-group path's token sequence; the group selection happens
4872    // per *symbol* during emission, not per *match*).
4873    let mut tokens = tokenize_lz77(pixels);
4874    if let Some(bits) = cache_code_bits {
4875        tokens = cacheify_tokens(&tokens, pixels, bits);
4876    }
4877
4878    let buckets = split_tokens_by_group(&tokens, &index, width, actual_groups);
4879    let cache_size = cache_code_bits.map(|b| 1usize << b).unwrap_or(0);
4880    let group_codes = build_group_codes(&buckets, cache_size, image_width);
4881
4882    let mut w = BitWriter::new();
4883
4884    // §3.8.2 optional-transform list: empty (no transforms in this
4885    // candidate). Future revisions can stack §4.1 / §4.2 / §4.4 atop
4886    // the multi-prefix path; for now we keep the candidate small.
4887    w.write_bit(false);
4888
4889    // §3.8.3 / §7.3 spatially-coded-image:
4890    //   color-cache-info meta-prefix data
4891    //
4892    // color-cache-info: `%b0` (no cache) or `%b1 4BIT` (enabled).
4893    if let Some(bits) = cache_code_bits {
4894        debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&bits));
4895        w.write_bit(true);
4896        w.write_bits(bits, 4);
4897    } else {
4898        w.write_bit(false);
4899    }
4900    // meta-prefix: `%b1` (multi-group).
4901    w.write_bit(true);
4902    // §6.2.2 `prefix_bits = ReadBits(3) + 2`.
4903    w.write_bits((prefix_bits - 2) as u32, 3);
4904
4905    // §6.2.2 entropy image, written as an `entropy-coded-image`
4906    // (color-cache-info=%b0 + single prefix-code group + LZ77 data).
4907    // The §6.2.2 entropy pixels carry `(meta_code >> 8) & 0xffff` in
4908    // red+green; the literal-only writer feeds the decoder's
4909    // `decode_entropy_coded_image` path exactly.
4910    let entropy_image = index.entropy_image_argb();
4911    write_entropy_coded_image_literals(&mut w, &entropy_image);
4912
4913    // §6.2.2 `num_prefix_groups` prefix-code groups, in canonical
4914    // group-index order (group 0 first, then group 1, …).
4915    for group in &group_codes {
4916        for code in group.iter() {
4917            code.write_code_lengths(&mut w);
4918        }
4919    }
4920
4921    // §6.2.3 LZ77 emission: walk tokens in original order, look up the
4922    // group for each token's *start* pixel, and emit its symbols with
4923    // that group's prefix codes. This matches the decoder's
4924    // group-per-symbol contract — the decoder picks the group for
4925    // each pixel from the meta-prefix index, which is constant across
4926    // every symbol contributing to a single token (literal,
4927    // cache-ref, or backward-reference copy whose covered pixels all
4928    // fall in the same block as the start pixel, ensured by the
4929    // block-aligned tokenisation that the chooser feeds the matcher;
4930    // see `bucket_aligns_with_decoder_groups_test`).
4931    let mut pos = 0usize;
4932    let w_pixels = width as usize;
4933    for &tok in &tokens {
4934        let x = (pos % w_pixels) as u32;
4935        let y = (pos / w_pixels) as u32;
4936        let g = index.group_for(x, y) as usize;
4937        let codes = &group_codes[g];
4938        let green_code = &codes[0];
4939        let red_code = &codes[1];
4940        let blue_code = &codes[2];
4941        let alpha_code = &codes[3];
4942        let dist_code = &codes[4];
4943        match tok {
4944            Token::Literal(p) => {
4945                let a = ((p >> 24) & 0xff) as usize;
4946                let r = ((p >> 16) & 0xff) as usize;
4947                let g_ch = ((p >> 8) & 0xff) as usize;
4948                let b = (p & 0xff) as usize;
4949                green_code.write_symbol(&mut w, g_ch);
4950                red_code.write_symbol(&mut w, r);
4951                blue_code.write_symbol(&mut w, b);
4952                alpha_code.write_symbol(&mut w, a);
4953                pos += 1;
4954            }
4955            Token::CacheRef { index: ix } => {
4956                debug_assert!(cache_size > 0, "CacheRef requires an enabled cache");
4957                let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + ix as usize;
4958                green_code.write_symbol(&mut w, sym);
4959                pos += 1;
4960            }
4961            Token::Copy { length, distance } => {
4962                write_lz77_value(&mut w, green_code, 256, length as u32);
4963                let raw_code = pixel_distance_to_distance_code(distance, image_width);
4964                write_lz77_value(&mut w, dist_code, 0, raw_code);
4965                pos += length;
4966            }
4967        }
4968    }
4969
4970    Some(w.into_bytes())
4971}
4972
4973/// Encode an ARGB image to a VP8L *image-stream* (the bytes that follow the
4974/// §3.4 5-byte image-header), running the §5.2.2 LZ77 backward-reference
4975/// matcher so repeated pixel runs compress.
4976///
4977/// As of round 120, the encoder also evaluates the §3.5.3 / §3.8.2
4978/// **subtract-green transform** and emits whichever of the two paths is
4979/// smaller. The transform header costs only three bits (`%b1 %b10`), so on
4980/// natural images where the green-correlated red/blue channels shrink the
4981/// per-channel entropy, subtract-green is a near-free compression win. On
4982/// images where the transform doesn't help (or hurts), the no-transform
4983/// path is kept.
4984///
4985/// `pixels` is `width * height` ARGB values in scan-line order, each
4986/// `(alpha << 24) | (red << 16) | (green << 8) | blue` — the same layout
4987/// [`crate::vp8l_decode::DecodedImage::pixels`] produces. The returned
4988/// bytes, prefixed with the image-header and wrapped in RIFF/WEBP framing,
4989/// decode back to `pixels` exactly.
4990pub fn encode_argb_literals(pixels: &[u32]) -> Vec<u8> {
4991    // Width-less entry: feed `image_width = 1`, which disables the §5.2.2
4992    // distance-map chooser (no map entry reconstructs to a "row" distance
4993    // when the row is a single pixel wide). Production callers go through
4994    // [`encode_argb_literals_with_width`] via [`encode_vp8l_payload`] so
4995    // the optimisation is wired for `.webp` output.
4996    encode_argb_literals_with_width(pixels, 1)
4997}
4998
4999/// Width-aware variant of [`encode_argb_literals`]: same 2×2
5000/// `(no-tx | subtract-green) × (no-cache | cache)` chooser, but each
5001/// candidate threads `image_width` into [`encode_tokens`] so the
5002/// §5.2.2 distance-map optimisation is exercised. The production
5003/// `.webp` path ([`encode_vp8l_payload`] → [`encode_webp_lossless`] /
5004/// [`encode_vp8l_argb`]) uses this entry; the no-width
5005/// [`encode_argb_literals`] is retained for test callers that exercise
5006/// the entropy stage without spatial structure.
5007pub fn encode_argb_literals_with_width(pixels: &[u32], image_width: u32) -> Vec<u8> {
5008    debug_assert!(image_width >= 1);
5009    // For each `(subtract_green)` choice, evaluate the no-cache
5010    // baseline plus every §5.2.3 `cache_code_bits ∈ [1..11]` and keep
5011    // the smallest stream per the round-148 sweep. The §5.2.3 cache
5012    // size is `1 << code_bits` (2..=2048 entries), so different
5013    // payloads peak at different sizes: small-palette images favour
5014    // narrow caches (less header overhead for the same hit-rate);
5015    // large-palette photo-like images favour wider caches (fewer hash
5016    // collisions). Sweeping is the only way to pick the best per
5017    // payload without an analytical model.
5018    let mut best = select_best_cache_bits(|cache_bits| {
5019        encode_literals_with_options(pixels, false, cache_bits, image_width)
5020    });
5021    let sg_best = select_best_cache_bits(|cache_bits| {
5022        encode_literals_with_options(pixels, true, cache_bits, image_width)
5023    });
5024    if sg_best.len() < best.len() {
5025        best = sg_best;
5026    }
5027    best
5028}
5029
5030/// Sweep §5.2.3 `cache_code_bits ∈ [1..11]` plus the disabled-cache
5031/// (`None`) baseline for an encoder candidate, returning the smallest
5032/// stream the closure produced.
5033///
5034/// `build_with_cache` takes the candidate `cache_code_bits` (`None`
5035/// = disable, `Some(bits)` = enable with the given size) and returns
5036/// the encoded bytes for that choice. The function calls
5037/// `build_with_cache` 12 times: once with `None` and once per value
5038/// in [`COLOR_CACHE_BITS_MIN`]..=[`COLOR_CACHE_BITS_MAX`], i.e. the
5039/// full §5.2.3 `[1..11]` range a compliant decoder accepts.
5040///
5041/// The §5.2.3 cache size is `1 << code_bits`, so the optimum varies
5042/// per payload:
5043///
5044/// * **Disabled** wins on uncorrelated noise (every "hit" is a hash
5045///   collision; the §3.8.3 `color-cache-info` `%b1 4BIT` header costs
5046///   five bits the no-cache path doesn't pay; the GREEN alphabet
5047///   stays at `256 + 24 = 280` symbols rather than growing to
5048///   `256 + 24 + cache_size`).
5049/// * **Narrow caches** (`code_bits` 1..4 → 2..16 entries) win on
5050///   payloads with a tiny effective palette where a 256-entry cache
5051///   wastes alphabet width on slots that never see a hit.
5052/// * **Wide caches** (`code_bits` 9..11 → 512..2048 entries) win on
5053///   photo-like images with hundreds of distinct colors where hash
5054///   collisions in a 256-entry cache prevent a hit.
5055///
5056/// Note that the §3.7.2 prefix code's alphabet length is exactly
5057/// `256 + 24 + (1 << code_bits)`, so a wider cache also widens every
5058/// emitted code-length-table entry; the trade-off between hit rate
5059/// and alphabet overhead is non-monotonic, which is why the chooser
5060/// sweeps the full range instead of using a single heuristic value.
5061fn select_best_cache_bits<F>(mut build_with_cache: F) -> Vec<u8>
5062where
5063    F: FnMut(Option<u32>) -> Vec<u8>,
5064{
5065    let mut best = build_with_cache(None);
5066    for bits in COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX {
5067        let cand = build_with_cache(Some(bits));
5068        if cand.len() < best.len() {
5069            best = cand;
5070        }
5071    }
5072    best
5073}
5074
5075/// Encode `pixels` with explicit knobs: optionally apply the §3.5.3 /
5076/// §3.8.2 subtract-green transform, optionally enable a §5.2.3 color
5077/// cache with the given `code_bits` (`None` disables it). The
5078/// implementation runs the §5.2.2 LZ77 matcher, then (if a cache is
5079/// requested) rewrites literal tokens into §5.2.3 cache references in
5080/// stream order, then emits the §3.8.3 image stream.
5081fn encode_literals_with_options(
5082    pixels: &[u32],
5083    subtract_green: bool,
5084    cache_code_bits: Option<u32>,
5085    image_width: u32,
5086) -> Vec<u8> {
5087    let mut working = pixels.to_vec();
5088    if subtract_green {
5089        apply_subtract_green(&mut working);
5090    }
5091    let mut tokens = tokenize_lz77(&working);
5092    if let Some(bits) = cache_code_bits {
5093        tokens = cacheify_tokens(&tokens, &working, bits);
5094    }
5095    encode_tokens(&tokens, subtract_green, cache_code_bits, image_width)
5096}
5097
5098/// Encode an ARGB image with the literal-only, no-transform path: every
5099/// pixel becomes a §5.2.1 ARGB literal and no §3.8.2 transform is written.
5100/// Retained as the baseline the round-119 size-reduction test compares the
5101/// LZ77 path against; [`encode_argb_literals`] is the default entry point.
5102pub fn encode_argb_literals_only(pixels: &[u32]) -> Vec<u8> {
5103    let tokens: Vec<Token> = pixels.iter().map(|&p| Token::Literal(p)).collect();
5104    // Literal-only stream emits no Copy tokens, so `image_width` is
5105    // unused by the entropy stage; pass 1 as the trivial value.
5106    encode_tokens(&tokens, false, None, 1)
5107}
5108
5109/// Encode an ARGB image forcing the §3.5.3 / §3.8.2 subtract-green
5110/// transform on, regardless of whether it shrinks the stream. Used by the
5111/// round-120 size-reduction comparison test to measure the transform's
5112/// effect on a natural-image-like fixture; production callers use
5113/// [`encode_argb_literals`] which picks the smaller of the two paths.
5114pub fn encode_argb_literals_subtract_green(pixels: &[u32]) -> Vec<u8> {
5115    let mut sg_pixels = pixels.to_vec();
5116    apply_subtract_green(&mut sg_pixels);
5117    let tokens = tokenize_lz77(&sg_pixels);
5118    // Width-less test entry: pass 1 (the chooser falls back to scan-line).
5119    encode_tokens(&tokens, true, None, 1)
5120}
5121
5122/// Encode an ARGB image forcing a §5.2.3 color cache on (size
5123/// `1 << cache_code_bits`), with no §3.8.2 transform. Used by the
5124/// round-121 size-reduction comparison test to isolate the cache's
5125/// effect from the subtract-green chooser; production callers use
5126/// [`encode_argb_literals`] which picks the smallest of the four
5127/// path combinations.
5128pub fn encode_argb_literals_color_cache(pixels: &[u32], cache_code_bits: u32) -> Vec<u8> {
5129    debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&cache_code_bits));
5130    // Width-less test entry: pass 1 (the chooser falls back to scan-line).
5131    encode_literals_with_options(pixels, false, Some(cache_code_bits), 1)
5132}
5133
5134/// Shared entropy stage: from a §5.2.2 token stream, build the five prefix
5135/// codes and emit the §3.8.3 image data (optional-transform header,
5136/// color-cache-info, meta-prefix, the five prefix-code length tables, then
5137/// the LZ77-coded image).
5138///
5139/// `subtract_green` controls the §3.8.2 transform header: `false` emits a
5140/// single `%b0` terminator (no transform); `true` emits `%b1 %b10 %b0` —
5141/// the subtract-green transform (type 2, bodyless) followed by the end-of-
5142/// list terminator.
5143///
5144/// `color_cache_code_bits` controls the §5.2.3 `color-cache-info` field:
5145/// `None` emits `%b0` (no cache); `Some(bits)` emits `%b1 4BIT` with the
5146/// caller-supplied `code_bits ∈ [1, 11]`. The token stream must already
5147/// reflect the choice — `CacheRef` tokens are only meaningful when the
5148/// cache is enabled.
5149///
5150/// `image_width` is the §3.4 image width the encoded stream describes;
5151/// it feeds [`pixel_distance_to_distance_code`] for the §5.2.2 distance
5152/// chooser so backward references whose scan-line distance equals
5153/// `xi + yi*image_width` for some distance-map entry get the smaller
5154/// distance code. Pass `1` to retain the round-119 scan-line-only
5155/// behaviour (no map codes match at width 1 for typical distances).
5156fn encode_tokens(
5157    tokens: &[Token],
5158    subtract_green: bool,
5159    color_cache_code_bits: Option<u32>,
5160    image_width: u32,
5161) -> Vec<u8> {
5162    let mut w = BitWriter::new();
5163
5164    // §3.8.2 optional-transform.
5165    if subtract_green {
5166        // Present-bit `%b1`, then 2-bit TransformType `SubtractGreen` (value
5167        // 2 in LSB-first bit order: bit0=0, bit1=1 — matches the spec's
5168        // `%b10` MSB-first notation when read through the LSB-first
5169        // `ReadBits(2)`). No body for subtract-green per §3.5.3 / §3.8.2.
5170        w.write_bit(true);
5171        w.write_bits(crate::vp8l_stream::TransformType::SubtractGreen as u32, 2);
5172    }
5173    // End-of-list terminator.
5174    w.write_bit(false);
5175
5176    write_spatially_coded_image(&mut w, tokens, color_cache_code_bits, image_width);
5177
5178    w.into_bytes()
5179}
5180
5181/// Write the §3.8.3 / §7.3 `spatially-coded-image` body — everything
5182/// after the §3.8.2 / §7.2 `optional-transform` terminator: the
5183/// `color-cache-info` bit(s), the `meta-prefix` bit (always `%b0` here
5184/// — single prefix-code group), the five prefix codes, and the
5185/// LZ77-coded image.
5186///
5187/// This is the writer counterpart of
5188/// [`crate::vp8l_decode::decode_argb`] for the single-meta-prefix
5189/// case, and the same body the §4.1 / §4.2 transform encoders wrap
5190/// after writing their own optional-transform header(s) (the
5191/// transform headers and any sub-resolution image bodies are written
5192/// by the caller; this function only emits the trailing
5193/// `spatially-coded-image`).
5194fn write_spatially_coded_image(
5195    w: &mut BitWriter,
5196    tokens: &[Token],
5197    color_cache_code_bits: Option<u32>,
5198    image_width: u32,
5199) {
5200    // §3.8.3 spatially-coded-image = color-cache-info meta-prefix data.
5201    // color-cache-info: `%b0` (no cache) or `%b1 4BIT` (enabled).
5202    let color_cache_size = match color_cache_code_bits {
5203        Some(bits) => {
5204            debug_assert!((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).contains(&bits));
5205            w.write_bit(true);
5206            w.write_bits(bits, 4);
5207            1usize << bits
5208        }
5209        None => {
5210            w.write_bit(false);
5211            0
5212        }
5213    };
5214    // meta-prefix: `%b0` (single prefix-code group).
5215    w.write_bit(false);
5216
5217    write_prefix_codes_and_tokens(w, tokens, color_cache_size, image_width);
5218}
5219
5220/// Write an §7.3 `entropy-coded-image` (color-cache-info + data) of
5221/// `pixels.len()` ARGB pixels in scan-line order, using a
5222/// literal-only encoding with NO color cache and NO LZ77 matching.
5223///
5224/// This is the body shape required for the §4.1 predictor image and
5225/// the §4.2 color-transform image (per §7.2 ABNF: `predictor-image =
5226/// 3BIT ; sub-pixel code / entropy-coded-image`). The decoder reads
5227/// it via [`crate::vp8l_decode::decode_entropy_coded_image`].
5228///
5229/// Sub-resolution transform images are tiny (one ARGB pixel per
5230/// `block_width × block_height` block of the main image), so the
5231/// per-pixel overhead of the §5.2.2 LZ77 / §5.2.3 cache machinery
5232/// rarely pays off — the literal-only path is the smallest write for
5233/// these bodies in practice.
5234fn write_entropy_coded_image_literals(w: &mut BitWriter, pixels: &[u32]) {
5235    // color-cache-info = `%b0` (no cache).
5236    w.write_bit(false);
5237
5238    let tokens: Vec<Token> = pixels.iter().map(|&p| Token::Literal(p)).collect();
5239    // `image_width = 1` is the trivial value (no Copy tokens are
5240    // emitted by a literal-only stream, so the distance-code chooser
5241    // is unused). `color_cache_size = 0` disables the cache alphabet.
5242    write_prefix_codes_and_tokens(w, &tokens, 0, 1);
5243}
5244
5245/// Shared `data = prefix-codes lz77-coded-image` writer (§3.8.3 /
5246/// §7.3). Builds the five §3.7.2 prefix codes from token
5247/// frequencies, writes their code lengths in green/red/blue/alpha/
5248/// distance order, then emits the token stream.
5249fn write_prefix_codes_and_tokens(
5250    w: &mut BitWriter,
5251    tokens: &[Token],
5252    color_cache_size: usize,
5253    image_width: u32,
5254) {
5255    // Build the five prefix codes from token frequencies. The GREEN
5256    // alphabet covers literals (`< 256`), the §5.2.2 length prefix
5257    // symbols (`256 + length_prefix`), and (when the cache is enabled)
5258    // the §5.2.3 cache indices (`256 + 24 + index`). The distance
5259    // alphabet (40 codes) is exercised only when the matcher emitted at
5260    // least one copy.
5261    let freqs = count_frequencies(tokens, color_cache_size, image_width);
5262    let green_code = WriteCode::from_freqs(&freqs.green);
5263    let red_code = WriteCode::from_freqs(&freqs.red);
5264    let blue_code = WriteCode::from_freqs(&freqs.blue);
5265    let alpha_code = WriteCode::from_freqs(&freqs.alpha);
5266    // Prefix #5 (distance): if no backward references were emitted, the
5267    // frequency table is all-zero → `from_freqs` yields the empty code,
5268    // which `WriteCode` serialises as the §3.7.2.1.1 single-symbol-0 form.
5269    let dist_code = if freqs.distance.iter().any(|&f| f > 0) {
5270        WriteCode::from_freqs(&freqs.distance)
5271    } else {
5272        WriteCode::empty(40)
5273    };
5274
5275    // data = prefix-codes lz77-coded-image.
5276    // prefix-code-group = 5 prefix codes, in bitstream order:
5277    // green, red, blue, alpha, distance.
5278    green_code.write_code_lengths(w);
5279    red_code.write_code_lengths(w);
5280    blue_code.write_code_lengths(w);
5281    alpha_code.write_code_lengths(w);
5282    dist_code.write_code_lengths(w);
5283
5284    // lz77-coded-image: each token is either a §5.2.1 ARGB literal
5285    // (channel order green, red, blue, alpha), a §5.2.3 color-cache
5286    // reference (a single GREEN symbol), or a §5.2.2 length + distance
5287    // backward reference.
5288    for &tok in tokens {
5289        match tok {
5290            Token::Literal(p) => {
5291                let a = ((p >> 24) & 0xff) as usize;
5292                let r = ((p >> 16) & 0xff) as usize;
5293                let g = ((p >> 8) & 0xff) as usize;
5294                let b = (p & 0xff) as usize;
5295                green_code.write_symbol(w, g);
5296                red_code.write_symbol(w, r);
5297                blue_code.write_symbol(w, b);
5298                alpha_code.write_symbol(w, a);
5299            }
5300            Token::CacheRef { index } => {
5301                // §5.2.3: GREEN symbol is `256 + 24 + index`. Red /
5302                // blue / alpha are not transmitted; the decoder
5303                // recovers the full ARGB from the cache slot.
5304                debug_assert!(color_cache_size > 0, "CacheRef requires an enabled cache");
5305                let sym = 256 + crate::vp8l_decode::NUM_LENGTH_PREFIX_CODES + index as usize;
5306                green_code.write_symbol(w, sym);
5307            }
5308            Token::Copy { length, distance } => {
5309                // §5.2.2: length via a GREEN length symbol (base 256), then
5310                // distance via prefix code #5 (base 0). The chooser must
5311                // agree with `count_frequencies` so the prefix-code Huffman
5312                // tree we built actually contains the prefix slot we look up.
5313                write_lz77_value(w, &green_code, 256, length as u32);
5314                let raw_code = pixel_distance_to_distance_code(distance, image_width);
5315                write_lz77_value(w, &dist_code, 0, raw_code);
5316            }
5317        }
5318    }
5319}
5320
5321/// Build the §3.4 / §7.1 5-byte VP8L image-header.
5322///
5323/// `0x2F` signature + 14-bit `(width-1)` + 14-bit `(height-1)` +
5324/// `alpha_is_used` bit + 3-bit `version` (0). The exact inverse of
5325/// [`crate::vp8l_chunk::WebpLosslessChunk::from_payload`]'s header peek.
5326fn build_image_header(width: u32, height: u32, alpha_is_used: bool) -> [u8; 5] {
5327    let packed: u32 =
5328        ((width - 1) & 0x3FFF) | (((height - 1) & 0x3FFF) << 14) | ((alpha_is_used as u32) << 28);
5329    // version is 0 → bits 29..31 stay zero.
5330    [
5331        crate::vp8l_chunk::VP8L_SIGNATURE,
5332        (packed & 0xFF) as u8,
5333        ((packed >> 8) & 0xFF) as u8,
5334        ((packed >> 16) & 0xFF) as u8,
5335        ((packed >> 24) & 0xFF) as u8,
5336    ]
5337}
5338
5339/// Encode an interleaved 8-bit RGBA image to a complete RIFF/WEBP file
5340/// carrying a §2.6 simple-lossless `VP8L` chunk.
5341///
5342/// `rgba` is `width * height * 4` bytes in scan-line order, each pixel
5343/// `[R, G, B, A]` — the `oxideav_core::PixelFormat::Rgba` layout
5344/// [`crate::DecodedWebp::rgba`] uses. The returned file decodes back to the
5345/// same RGBA bytes through [`crate::decode_webp`], a pixel-exact round trip.
5346///
5347/// The encoder takes the simplest spec-conformant path: no §3.8.2
5348/// transform, no §3.8.3 color cache, a single meta-prefix code, and a
5349/// literal-only image (no LZ77 backward references). The §3.7.2 prefix
5350/// codes are built per-image from the pixel data.
5351pub fn encode_webp_lossless(rgba: &[u8], width: u32, height: u32) -> Result<Vec<u8>, EncodeError> {
5352    if width == 0 || height == 0 || width > MAX_DIMENSION || height > MAX_DIMENSION {
5353        return Err(EncodeError::InvalidDimensions { width, height });
5354    }
5355    let expected = (width as usize) * (height as usize) * 4;
5356    if rgba.len() != expected {
5357        return Err(EncodeError::PixelBufferMismatch {
5358            got: rgba.len(),
5359            expected,
5360        });
5361    }
5362
5363    // Repack RGBA → ARGB and detect whether alpha is non-trivial.
5364    let mut pixels = Vec::with_capacity(rgba.len() / 4);
5365    let mut alpha_is_used = false;
5366    for px in rgba.chunks_exact(4) {
5367        let (r, g, b, a) = (px[0] as u32, px[1] as u32, px[2] as u32, px[3] as u32);
5368        if a != 0xff {
5369            alpha_is_used = true;
5370        }
5371        pixels.push((a << 24) | (r << 16) | (g << 8) | b);
5372    }
5373
5374    let payload = encode_vp8l_payload(&pixels, width, height, alpha_is_used);
5375
5376    // §2.4 / §2.6 RIFF/WEBP framing around the VP8L payload.
5377    let file = build::build_webp_file(&payload, ImageKind::Lossless, width, height)?;
5378    Ok(file)
5379}
5380
5381/// Validate `width`/`height` against the §3.4 14-bit field range and check
5382/// that an ARGB pixel slice carries exactly `width * height` pixels.
5383///
5384/// Shared by the bare-bitstream [`encode_vp8l_argb`] / [`encode_vp8l_argb_with`]
5385/// entry points. Returns the §3.7.2.1.1 "pixel buffer is N, expected M"
5386/// mismatch error using `pixels.len() * 4` so the byte counts match the
5387/// RGBA-flavoured [`encode_webp_lossless`] error.
5388fn validate_argb(pixels: &[u32], width: u32, height: u32) -> Result<(), EncodeError> {
5389    if width == 0 || height == 0 || width > MAX_DIMENSION || height > MAX_DIMENSION {
5390        return Err(EncodeError::InvalidDimensions { width, height });
5391    }
5392    let expected = (width as usize) * (height as usize);
5393    if pixels.len() != expected {
5394        return Err(EncodeError::PixelBufferMismatch {
5395            got: pixels.len() * 4,
5396            expected: expected * 4,
5397        });
5398    }
5399    Ok(())
5400}
5401
5402/// Assemble the bare §2.6 / §3.4 `VP8L` chunk **payload** for an ARGB image:
5403/// the 5-byte §3.4 image-header followed by the §3.8.3 image stream.
5404///
5405/// `pixels` is `width * height` ARGB values in scan-line order, each
5406/// `(alpha << 24) | (red << 16) | (green << 8) | blue`. `alpha_is_used`
5407/// becomes the §3.4 `alpha_is_used` header bit. This is the inner payload a
5408/// `VP8L` chunk wraps — *not* a RIFF/WEBP file. Callers wanting the framed
5409/// file use [`encode_webp_lossless`] / [`encode_vp8l_argb_with_metadata`].
5410fn encode_vp8l_payload(pixels: &[u32], width: u32, height: u32, alpha_is_used: bool) -> Vec<u8> {
5411    // Production path: thread the actual image width so the §5.2.2
5412    // distance-map chooser can swap row-style scan-line codes for
5413    // small distance-map codes (round 130).
5414    let stream = encode_argb_with_predictor_chooser(pixels, width, height);
5415    let header = build_image_header(width, height, alpha_is_used);
5416    let mut payload = Vec::with_capacity(header.len() + stream.len());
5417    payload.extend_from_slice(&header);
5418    payload.extend_from_slice(&stream);
5419    payload
5420}
5421
5422/// Width × height-aware super-chooser: evaluates the four
5423/// `(no-tx | subtract-green) × (no-cache | cache)` candidates plus
5424/// (as of round 155) two §4.1 spatial-predictor `size_bits`
5425/// candidates, two §3.5.2 / §4.2 color-transform `size_bits`
5426/// candidates, and (as of round 150) one §4.4 color-indexing
5427/// candidate when the unique-color count fits in the §4.4
5428/// 256-entry table, each with the round-148 §5.2.3
5429/// `cache_code_bits ∈ [1..11]` sweep plus the disabled-cache
5430/// baseline. Returns the smallest of the resulting streams.
5431///
5432/// The block-based transform-bearing candidates (§4.1 predictor,
5433/// §4.2 color) are only considered when both dimensions are at least
5434/// `1 << size_bits` (otherwise the sub-resolution transform image
5435/// collapses to a single block with no useful per-block resolution).
5436/// The §4.4 color-indexing candidate has no per-block size_bits and
5437/// is gated solely on palette feasibility (≤ 256 unique colors);
5438/// for smaller images or photo-like content the existing
5439/// no-transform / subtract-green chooser remains the only path.
5440fn encode_argb_with_predictor_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
5441    let mut best = encode_argb_literals_with_width(pixels, width);
5442
5443    // The §4.1 predictor and §4.2 color transform pay off once the
5444    // image is at least one block wide AND tall, so each block
5445    // carries some real per-block residual mass. For images smaller
5446    // than a block, the chooser skips both transforms (the no-tx /
5447    // subtract-green paths are strictly cheaper in that regime — no
5448    // transform header, no sub-image bytes).
5449    let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
5450    let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
5451    let pred_block = 1u32 << pred_size_bits;
5452    let ctx_block = 1u32 << ctx_size_bits;
5453
5454    if width >= pred_block && height >= pred_block {
5455        // Round 155: sweep two `size_bits` values for the §4.1
5456        // spatial predictor, mirroring the §4.2 color-transform shape
5457        // below. The default (16-pixel blocks → per-region predictor-
5458        // mode granularity, good for images whose local statistics
5459        // change across regions) is paired with a maximal single-block
5460        // transform whose `size_bits` is large enough that the entire
5461        // image collapses into one mode (1 sub-image pixel → 4-byte
5462        // sub-image overhead, the cheapest possible §4.1 header). Per
5463        // RFC 9649 §4.1 `size_bits` ranges over `[2..=9]` (`block`
5464        // sizes 4..=512); the maximal value here is whatever `2..=9`
5465        // makes the sub-image at most 1×1. Single-block is best on
5466        // images whose local statistics agree everywhere (one
5467        // dominant predictor mode does the entire image, so the per-
5468        // region mode-image's bits are pure overhead); per-region
5469        // wins on images whose best-mode varies spatially.
5470        let mut pred_single_block_size_bits: u8 = pred_size_bits;
5471        while pred_single_block_size_bits < 9
5472            && ((1u32 << pred_single_block_size_bits) < width
5473                || (1u32 << pred_single_block_size_bits) < height)
5474        {
5475            pred_single_block_size_bits += 1;
5476        }
5477        // Deduplicate when the per-region and single-block size_bits
5478        // collapse onto the same value (small images).
5479        let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
5480        // Round 148: per `size_bits`, sweep §5.2.3
5481        // `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline
5482        // (was hardcoded at `DEFAULT_COLOR_CACHE_BITS = 8`).
5483        let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
5484            encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
5485        })];
5486        // Round 160: add §4.1 slack-cost tie-break candidates.
5487        // `slack > 0` lets the per-block chooser swap to the
5488        // preferred-neighbour mode at a small residual-cost
5489        // increase, dropping the §7.2 predictor-sub-image's symbol
5490        // entropy. The slack budget is expressed in residual-
5491        // magnitude units summed across the whole block, so it
5492        // scales linearly with the block's pixel count to stay a
5493        // bounded per-pixel quantity. Two slack settings (1× and 2×
5494        // the pixel count) are tried; the chooser picks the
5495        // shortest stream and is therefore non-regressing relative
5496        // to the strict-tie-break (slack = 0) baseline.
5497        let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
5498        for slack in [
5499            pred_block_pixels,
5500            2 * pred_block_pixels,
5501            4 * pred_block_pixels,
5502        ] {
5503            pred_candidates.push(select_best_cache_bits(|cache_bits| {
5504                encode_with_predictor_slack(
5505                    pixels,
5506                    width,
5507                    height,
5508                    pred_size_bits,
5509                    cache_bits,
5510                    width,
5511                    slack,
5512                )
5513            }));
5514        }
5515        // Round 161: add the Shannon-entropy bit-cost candidate at
5516        // the per-region `size_bits`. Per-block mode is chosen by
5517        // a true Huffman lower-bound bit cost on the residual byte
5518        // histogram rather than the L1-magnitude proxy used by the
5519        // round-159/160 candidates. RFC 9649 §3.5 authorises the
5520        // choice ("transform data can be decided based on entropy
5521        // minimization"); the entropy cost replaces the proxy with
5522        // the actual metric Huffman codes minimise. The chooser
5523        // keeps both the entropy and L1 candidates and emits the
5524        // byte-shortest stream so the round-161 path cannot
5525        // regress against the round-160 baseline.
5526        pred_candidates.push(select_best_cache_bits(|cache_bits| {
5527            encode_with_predictor_entropy(pixels, width, height, pred_size_bits, cache_bits, width)
5528        }));
5529        // Round 162: add the *sub-image-aware* Shannon-entropy
5530        // candidate at the per-region `size_bits` across a small
5531        // lambda sweep. Per-block mode is chosen on a joint cost
5532        // that adds the §7.2 predictor sub-image's marginal Shannon
5533        // bit-cost contribution (weighted by lambda) to the round-
5534        // 161 per-block residual entropy. Where the round-159 hint
5535        // and round-160 slack budget act only on local neighbour
5536        // identity, the round-162 chooser accounts for the running
5537        // sub-image distribution globally. `lambda_milli = 0`
5538        // recovers the round-161 chooser exactly; the swept values
5539        // here weight one sub-image bit at 1×, 4×, 16× a residual
5540        // bit (a 16×16 block contains 256 residual symbols per
5541        // channel — so even modest sub-image weighting can pay back
5542        // through longer mode-runs in the sub-image's prefix code).
5543        // The chooser keeps the byte-shortest stream so the round-
5544        // 162 path cannot regress against the round-161 baseline.
5545        //
5546        // The lambda sweep targets the empirically-observed cost
5547        // crossover on smooth-gradient fixtures (~64000 milli-per-
5548        // bit): below that, the residual cost dominates and the
5549        // round-161 chooser already wins; above that, the sub-
5550        // image's mass dominates and converging the mode set pays
5551        // back through a much smaller §7.2 prefix-code header.
5552        for lambda_milli in [4_000u64, 16_000u64, 64_000u64, 256_000u64] {
5553            pred_candidates.push(select_best_cache_bits(|cache_bits| {
5554                encode_with_predictor_entropy_subaware(
5555                    pixels,
5556                    width,
5557                    height,
5558                    pred_size_bits,
5559                    cache_bits,
5560                    width,
5561                    lambda_milli,
5562                )
5563            }));
5564        }
5565        if try_pred_single_block {
5566            pred_candidates.push(select_best_cache_bits(|cache_bits| {
5567                encode_with_predictor(
5568                    pixels,
5569                    width,
5570                    height,
5571                    pred_single_block_size_bits,
5572                    cache_bits,
5573                    width,
5574                )
5575            }));
5576            // Round-160 slack-cost candidates also at the single-
5577            // block size_bits. A single block has one predictor-
5578            // image entry, so the slack-cost variant degenerates to
5579            // the strict variant at this `size_bits` (no neighbour
5580            // hint exists to fire); the candidate is still
5581            // evaluated to keep the sweep regular, but its
5582            // contribution to the byte-best win comes through the
5583            // per-region size_bits.
5584            let single_pred_block_pixels: u64 =
5585                (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
5586            for slack in [
5587                single_pred_block_pixels,
5588                2 * single_pred_block_pixels,
5589                4 * single_pred_block_pixels,
5590            ] {
5591                pred_candidates.push(select_best_cache_bits(|cache_bits| {
5592                    encode_with_predictor_slack(
5593                        pixels,
5594                        width,
5595                        height,
5596                        pred_single_block_size_bits,
5597                        cache_bits,
5598                        width,
5599                        slack,
5600                    )
5601                }));
5602            }
5603            // Round 161: also evaluate the Shannon-entropy candidate
5604            // at the single-block size_bits. With one block the hint
5605            // mechanism never fires (no neighbour exists) and the
5606            // entropy chooser degenerates to "pick the mode whose
5607            // single-block residual histogram has the lowest Huffman
5608            // bit cost" — still a strict improvement over the L1
5609            // proxy on fixtures whose distribution skews the
5610            // ordering between the two metrics.
5611            pred_candidates.push(select_best_cache_bits(|cache_bits| {
5612                encode_with_predictor_entropy(
5613                    pixels,
5614                    width,
5615                    height,
5616                    pred_single_block_size_bits,
5617                    cache_bits,
5618                    width,
5619                )
5620            }));
5621        }
5622        for cand in pred_candidates {
5623            if cand.len() < best.len() {
5624                best = cand;
5625            }
5626        }
5627    }
5628
5629    if width >= ctx_block && height >= ctx_block {
5630        // Sweep two `size_bits` values for the color transform: the
5631        // default (16-pixel blocks → per-region CTE granularity, good
5632        // for varying-correlation natural images) and a maximal
5633        // single-block transform whose `size_bits` is large enough
5634        // that the entire image collapses into one CTE (1 sub-image
5635        // pixel → 4-byte sub-image overhead, the cheapest possible
5636        // header). Single-block is best for high-noise images with
5637        // a single dominant channel correlation; per-region wins on
5638        // images whose correlation varies spatially.
5639        let mut single_block_size_bits: u8 = ctx_size_bits;
5640        while single_block_size_bits < 9
5641            && ((1u32 << single_block_size_bits) < width
5642                || (1u32 << single_block_size_bits) < height)
5643        {
5644            single_block_size_bits += 1;
5645        }
5646        // Deduplicate when the per-region and single-block size_bits
5647        // collapse onto the same value (small images).
5648        let try_single_block = single_block_size_bits != ctx_size_bits;
5649        // Round 148: per `size_bits`, sweep §5.2.3
5650        // `cache_code_bits ∈ [1..11]` plus the disabled-cache baseline
5651        // (was hardcoded at `DEFAULT_COLOR_CACHE_BITS = 8`).
5652        let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
5653            encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
5654        })];
5655        // Round 308: §4.2 entropy-cost per-block CTE candidate at the
5656        // per-region `size_bits`. Where the L1 chooser above scores
5657        // each candidate by the folded residual magnitude, this one
5658        // scores by the Shannon lower-bound bit cost of the per-channel
5659        // residual histogram — the §4.2 analogue of the round-161 §4.1
5660        // predictor entropy chooser (RFC 9649 §3.5 authorises deciding
5661        // transform data by entropy minimization). The chooser keeps
5662        // the byte-shortest stream, so this candidate cannot regress
5663        // against the L1 path, and round-trip output is identical
5664        // regardless of which CTE the cost model records.
5665        candidates.push(select_best_cache_bits(|cache_bits| {
5666            encode_with_color_transform_strategy(
5667                pixels,
5668                width,
5669                height,
5670                ctx_size_bits,
5671                cache_bits,
5672                width,
5673                ColorTransformStrategy::Entropy,
5674            )
5675        }));
5676        if try_single_block {
5677            candidates.push(select_best_cache_bits(|cache_bits| {
5678                encode_with_color_transform(
5679                    pixels,
5680                    width,
5681                    height,
5682                    single_block_size_bits,
5683                    cache_bits,
5684                    width,
5685                )
5686            }));
5687            // Round 308: entropy-cost CTE candidate at the single-block
5688            // `size_bits`. With one block the histogram is the whole
5689            // image's per-channel residual distribution, so the entropy
5690            // metric selects the single CTE whose red / blue residual
5691            // streams carry the cheapest §5.x prefix codes.
5692            candidates.push(select_best_cache_bits(|cache_bits| {
5693                encode_with_color_transform_strategy(
5694                    pixels,
5695                    width,
5696                    height,
5697                    single_block_size_bits,
5698                    cache_bits,
5699                    width,
5700                    ColorTransformStrategy::Entropy,
5701                )
5702            }));
5703        }
5704        for cand in candidates {
5705            if cand.len() < best.len() {
5706                best = cand;
5707            }
5708        }
5709
5710        // Round 303: §3.5 stacked-transform candidate — §4.2 cross-color
5711        // chained with the §4.1 predictor over the color-transformed
5712        // image, the pair the spec targets at photo / natural-image
5713        // content. The color transform decorrelates red / blue against
5714        // green; the predictor then removes the spatial correlation that
5715        // survives in each channel, so the entropy stage sees residuals
5716        // closer to zero than either transform alone. The candidate is
5717        // non-regressing (kept only when strictly smaller than the running
5718        // best) and reuses the same `width >= ctx_block && height >=
5719        // ctx_block` gate (both stacked sub-images need at least one full
5720        // block square). Two `size_bits` are swept — the default
5721        // per-region granularity and a maximal single-block header —
5722        // each across the round-148 cache-bits sweep.
5723        let mut ctp_size_bits = vec![ctx_size_bits];
5724        if try_single_block {
5725            ctp_size_bits.push(single_block_size_bits);
5726        }
5727        // Round 305: sweep the predictor-sub-image strategy (L1 /
5728        // entropy / sub-image-aware) over the color-decorrelated
5729        // residual the chain feeds the predictor. Non-regressing — the
5730        // byte-shortest candidate is kept.
5731        for &sb in &ctp_size_bits {
5732            for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5733                let cand = select_best_cache_bits(|cache_bits| {
5734                    encode_with_color_transform_predictor(
5735                        pixels,
5736                        width,
5737                        height,
5738                        sb,
5739                        cache_bits,
5740                        pred_strategy,
5741                    )
5742                });
5743                if cand.len() < best.len() {
5744                    best = cand;
5745                }
5746            }
5747        }
5748
5749        // Round 304: §3.5 *three-transform* stacked candidate — §4.2
5750        // cross-color → §4.3 subtract-green → §4.1 predictor, the natural
5751        // three-axis extension of the round-303 color + predictor pair.
5752        // The per-block §4.2 color transform removes the modeled
5753        // inter-channel correlation; a header-free §4.3 subtract-green pass
5754        // then removes the uniform red/blue-vs-green correlation that
5755        // survives the coarse per-block CTE multipliers; a §4.1 predictor
5756        // pass removes the spatial correlation left in each channel. RFC
5757        // 9649 §3.5 permits up to four transforms stacked (each used once)
5758        // with inverses applied last-read-first; the decoder's generic
5759        // reverse-read-order chain already handles this list, so no decoder
5760        // change is required. The candidate is non-regressing (kept only
5761        // when strictly smaller than the running best) and reuses the same
5762        // `width >= ctx_block && height >= ctx_block` gate, swept over the
5763        // default per-region and maximal single-block `size_bits` each
5764        // across the round-148 cache-bits sweep.
5765        // Round 305: sweep the predictor-sub-image strategy here too.
5766        for &sb in &ctp_size_bits {
5767            for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5768                let cand = select_best_cache_bits(|cache_bits| {
5769                    encode_with_color_transform_subtract_green_predictor(
5770                        pixels,
5771                        width,
5772                        height,
5773                        sb,
5774                        cache_bits,
5775                        pred_strategy,
5776                    )
5777                });
5778                if cand.len() < best.len() {
5779                    best = cand;
5780                }
5781            }
5782        }
5783    }
5784
5785    // Round 150: §4.4 color-indexing transform candidate. Considered
5786    // unconditionally (no per-block size_bits to sweep): a single
5787    // O(N) palette probe decides feasibility, so the path is cheap
5788    // to skip on photo-like content. On palette-ish images (icons,
5789    // line art, screen captures) the bundled-index stream shrinks
5790    // the §5 image data dramatically (a 4-color image packs 4 pixels
5791    // per byte at width_bits=2, giving the entropy stage 1/4 the
5792    // symbols to code), more than paying for the palette-write
5793    // overhead.
5794    if collect_palette(pixels).is_some() {
5795        let ci_best = select_best_cache_bits(|cache_bits| {
5796            encode_with_color_indexing(pixels, width, height, cache_bits)
5797                .expect("palette feasibility already confirmed")
5798        });
5799        if ci_best.len() < best.len() {
5800            best = ci_best;
5801        }
5802
5803        // Round 302: §3.5 stacked-transform candidate — §4.4
5804        // color-indexing chained with the §4.1 predictor over the
5805        // bundled-index image. On palette content the bundled green-
5806        // channel indices run in long spatially-coherent stretches, so
5807        // a predictor pass over them drives the residuals toward zero
5808        // and shrinks the entropy stage below the single-transform
5809        // color-indexing path. The candidate is non-regressing: it is
5810        // only kept when strictly smaller than the running best, and it
5811        // self-skips (returns `None`) when the packed image is too
5812        // small to carry a predictor block. Two `size_bits` are swept —
5813        // the default per-region granularity and a maximal single-block
5814        // header — each across the round-148 cache-bits sweep.
5815        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
5816        let mut ci_pred_single_block: u8 = pred_size_bits;
5817        while ci_pred_single_block < 9
5818            && ((1u32 << ci_pred_single_block) < width || (1u32 << ci_pred_single_block) < height)
5819        {
5820            ci_pred_single_block += 1;
5821        }
5822        let mut ci_pred_size_bits = vec![pred_size_bits];
5823        if ci_pred_single_block != pred_size_bits {
5824            ci_pred_size_bits.push(ci_pred_single_block);
5825        }
5826        // Round 305: sweep the predictor-sub-image strategy (L1 /
5827        // entropy / sub-image-aware) over the packed-index residual the
5828        // chain feeds the predictor. Non-regressing — kept only when
5829        // strictly smaller than the running best.
5830        for &sb in &ci_pred_size_bits {
5831            for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
5832                let mut got_candidate = false;
5833                let cand = select_best_cache_bits(|cache_bits| {
5834                    match encode_with_color_indexing_predictor(
5835                        pixels,
5836                        width,
5837                        height,
5838                        sb,
5839                        cache_bits,
5840                        pred_strategy,
5841                    ) {
5842                        Some(bytes) => {
5843                            got_candidate = true;
5844                            bytes
5845                        }
5846                        // Packed image too small for this `size_bits`. Emit
5847                        // a sentinel longer than the running best so the
5848                        // cache sweep discards it; `got_candidate` stays
5849                        // false and the outer comparison is skipped.
5850                        None => vec![0u8; best.len() + 1],
5851                    }
5852                });
5853                if got_candidate && cand.len() < best.len() {
5854                    best = cand;
5855                }
5856            }
5857        }
5858    }
5859
5860    // Round 151: §6.2.2 multi-meta-prefix (entropy-image) candidate.
5861    // Sweeps a small set of `(prefix_bits, num_groups)` combinations,
5862    // each paired with the round-148 `cache_code_bits ∈ [1..11]` plus
5863    // disabled-cache baseline; whichever is smallest is compared
5864    // against the running `best`. The candidate is only built when
5865    // the image is large enough to contain `num_groups` blocks at the
5866    // current `prefix_bits` (the `encode_with_meta_prefix` helper
5867    // returns `None` otherwise). Multi-group encoding pays for itself
5868    // on images whose per-region statistics diverge (e.g. natural
5869    // images with sky-vs-foreground contrast, screenshots with
5870    // distinct UI regions) where separate per-region Huffman codes
5871    // shrink the LZ77 stream by more than the entropy-image +
5872    // additional code-length-table overhead.
5873    if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
5874        if mp_best.len() < best.len() {
5875            best = mp_best;
5876        }
5877    }
5878
5879    best
5880}
5881
5882/// Sweep every `(prefix_bits, num_groups, cache_code_bits)` combination
5883/// the §6.2.2 multi-meta-prefix candidate admits and return the smallest
5884/// resulting stream, or `None` if no `(prefix_bits, num_groups)` pair
5885/// produced a non-degenerate stream (i.e. the image was too small for any
5886/// multi-block split, or every clustering collapsed to a single group).
5887fn sweep_meta_prefix_candidate(pixels: &[u32], width: u32, height: u32) -> Option<Vec<u8>> {
5888    let mut best: Option<Vec<u8>> = None;
5889    for &prefix_bits in META_PREFIX_BITS_SWEEP.iter() {
5890        for num_groups in 2..=MAX_META_GROUPS {
5891            // Per-(prefix_bits, num_groups), sweep the cache sizes;
5892            // some shapes are degenerate (None returned). Track the
5893            // best non-degenerate candidate.
5894            let mut shape_best: Option<Vec<u8>> = None;
5895            for cache_opt in
5896                std::iter::once(None).chain((COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).map(Some))
5897            {
5898                if let Some(cand) = encode_with_meta_prefix(
5899                    pixels,
5900                    width,
5901                    height,
5902                    prefix_bits,
5903                    num_groups,
5904                    cache_opt,
5905                    width,
5906                ) {
5907                    match &shape_best {
5908                        Some(s) if s.len() <= cand.len() => {}
5909                        _ => shape_best = Some(cand),
5910                    }
5911                }
5912            }
5913            if let Some(cand) = shape_best {
5914                match &best {
5915                    Some(b) if b.len() <= cand.len() => {}
5916                    _ => best = Some(cand),
5917                }
5918            }
5919        }
5920    }
5921    best
5922}
5923
5924/// Encode an ARGB image to a **bare** §2.6 / §3.4 `VP8L` bitstream — the
5925/// chunk payload (image-header + image stream), with **no** RIFF/WEBP
5926/// wrapper.
5927///
5928/// `pixels` is `width * height` ARGB values in scan-line order, each
5929/// `(alpha << 24) | (red << 16) | (green << 8) | blue`. The `alpha_is_used`
5930/// §3.4 header bit is auto-detected: it is set iff any pixel's alpha byte is
5931/// not `0xff`. Use [`encode_vp8l_argb_with`] to force the bit explicitly.
5932///
5933/// The output is the exact byte sequence
5934/// [`crate::vp8l_chunk::WebpLosslessChunk::bitstream`] returns for a framed
5935/// file — i.e. wrapping it in `build_chunk(fourcc::VP8L, ..)` (or
5936/// [`build::build_webp_file`] with [`ImageKind::Lossless`]) yields a complete
5937/// `.webp`. Encoding path matches [`encode_webp_lossless`]: no §3.8.2
5938/// transform, no §3.8.3 color cache, single meta-prefix code, literal-only.
5939pub fn encode_vp8l_argb(pixels: &[u32], width: u32, height: u32) -> Result<Vec<u8>, EncodeError> {
5940    let alpha_is_used = pixels.iter().any(|&p| (p >> 24) & 0xff != 0xff);
5941    encode_vp8l_argb_with(pixels, width, height, alpha_is_used)
5942}
5943
5944/// Encode an ARGB image to a bare §2.6 / §3.4 `VP8L` bitstream with the
5945/// §3.4 `alpha_is_used` header bit set **explicitly** by the caller.
5946///
5947/// Identical to [`encode_vp8l_argb`] but with a fixed (non-auto-detected)
5948/// `alpha_is_used`. A caller that already knows whether the image carries
5949/// alpha — e.g. one decoding the §2.7.1 `VP8X` `L` flag — avoids the
5950/// per-pixel scan. Setting `alpha_is_used = true` on a fully-opaque image is
5951/// permitted (a decoder reconstructs the same opaque pixels); setting it
5952/// `false` on an image with non-opaque pixels still round-trips because the
5953/// alpha values are carried in the §3.7.3 ARGB literals regardless of the
5954/// header bit.
5955pub fn encode_vp8l_argb_with(
5956    pixels: &[u32],
5957    width: u32,
5958    height: u32,
5959    alpha_is_used: bool,
5960) -> Result<Vec<u8>, EncodeError> {
5961    validate_argb(pixels, width, height)?;
5962    Ok(encode_vp8l_payload(pixels, width, height, alpha_is_used))
5963}
5964
5965#[cfg(test)]
5966mod tests {
5967    use super::*;
5968    use crate::vp8l_prefix::PrefixCode;
5969    use crate::vp8l_stream::BitReader;
5970
5971    // ---- BitWriter ----
5972
5973    #[test]
5974    fn bit_writer_round_trips_through_bit_reader() {
5975        let mut w = BitWriter::new();
5976        w.write_bits(0b101, 3);
5977        w.write_bits(0xABCD, 16);
5978        w.write_bit(true);
5979        let bytes = w.into_bytes();
5980        let mut r = BitReader::new(&bytes);
5981        assert_eq!(r.read_bits(3).unwrap(), 0b101);
5982        assert_eq!(r.read_bits(16).unwrap(), 0xABCD);
5983        assert!(r.read_bit().unwrap());
5984    }
5985
5986    // ---- canonical code construction ----
5987
5988    #[test]
5989    fn code_lengths_single_symbol_is_length_one() {
5990        let mut freq = vec![0u32; 8];
5991        freq[3] = 10;
5992        let lengths = build_code_lengths(&freq);
5993        assert_eq!(lengths[3], 1);
5994        assert_eq!(lengths.iter().filter(|&&l| l != 0).count(), 1);
5995    }
5996
5997    #[test]
5998    fn code_lengths_two_symbols_length_one_each() {
5999        let mut freq = vec![0u32; 4];
6000        freq[1] = 5;
6001        freq[2] = 5;
6002        let lengths = build_code_lengths(&freq);
6003        assert_eq!(lengths[1], 1);
6004        assert_eq!(lengths[2], 1);
6005    }
6006
6007    #[test]
6008    fn code_lengths_kraft_sum_is_one() {
6009        // A skewed distribution that produces varied lengths.
6010        let freq = vec![100u32, 1, 1, 1, 50, 25, 4, 2];
6011        let lengths = build_code_lengths(&freq);
6012        let mut k = 0f64;
6013        for &l in &lengths {
6014            if l > 0 {
6015                k += 2f64.powi(-(l as i32));
6016            }
6017        }
6018        assert!((k - 1.0).abs() < 1e-9, "Kraft sum {k} != 1");
6019    }
6020
6021    /// Round 303: the §3.7.2.1.2 code-length-code lengths are written in a
6022    /// 3-bit on-wire field, so they must never exceed 7. A skewed CLC
6023    /// frequency histogram (one length value far more common than the rest)
6024    /// drives the plain Huffman build to assign a length-8+ code to a rare
6025    /// CLC symbol; `build_clc_code_lengths` must re-balance it back under 8
6026    /// while keeping the table complete (Kraft sum exactly 1). Without the
6027    /// cap the 3-bit field silently truncated the over-long length to 0,
6028    /// corrupting the table into an incomplete code the decoder rejects.
6029    #[test]
6030    fn clc_code_lengths_capped_at_seven_and_complete() {
6031        // Histogram that drives the plain build past length 7: one
6032        // dominant length value plus a long tail of rare ones, exactly the
6033        // shape that produces a deep Huffman tree.
6034        let clc_freq: Vec<u32> = vec![
6035            1, 100_000, 1, 50_000, 25_000, 12_000, 6_000, 3_000, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
6036        ];
6037        // The plain build does produce an over-long (>7) code here, so the
6038        // cap path is genuinely exercised.
6039        let plain = build_code_lengths(&clc_freq);
6040        assert!(
6041            plain.iter().any(|&l| l as usize > MAX_CLC_CODE_LENGTH),
6042            "test premise: plain build must exceed the 3-bit CLC ceiling"
6043        );
6044
6045        let capped = build_clc_code_lengths(&clc_freq);
6046        assert!(
6047            capped.iter().all(|&l| l as usize <= MAX_CLC_CODE_LENGTH),
6048            "CLC lengths must all fit the 3-bit field: {capped:?}"
6049        );
6050        // Still a complete code (Kraft sum == 1 over the used symbols).
6051        let mut k = 0f64;
6052        for &l in &capped {
6053            if l > 0 {
6054                k += 2f64.powi(-(l as i32));
6055            }
6056        }
6057        assert!((k - 1.0).abs() < 1e-9, "capped CLC Kraft sum {k} != 1");
6058    }
6059
6060    #[test]
6061    fn built_code_decodes_through_prefix_reader() {
6062        // Build a code, emit symbols with it, and decode with the
6063        // round-104 reader to confirm bit-exact agreement.
6064        let freq = vec![40u32, 10, 5, 5, 1, 0, 0, 0];
6065        let code = WriteCode::from_freqs(&freq);
6066        let mut w = BitWriter::new();
6067        code.write_code_lengths(&mut w);
6068        // Emit symbols 0,1,2,3,4 in sequence.
6069        let seq = [0usize, 1, 2, 3, 4, 0, 0, 1];
6070        for &s in &seq {
6071            code.write_symbol(&mut w, s);
6072        }
6073        let bytes = w.into_bytes();
6074        let mut r = BitReader::new(&bytes);
6075        let decoded = PrefixCode::read(&mut r, freq.len()).unwrap();
6076        for &s in &seq {
6077            assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, s);
6078        }
6079    }
6080
6081    #[test]
6082    fn empty_distance_code_is_single_symbol_zero() {
6083        let code = WriteCode::empty(40);
6084        let mut w = BitWriter::new();
6085        code.write_code_lengths(&mut w);
6086        let bytes = w.into_bytes();
6087        let mut r = BitReader::new(&bytes);
6088        let decoded = PrefixCode::read(&mut r, 40).unwrap();
6089        assert_eq!(decoded.single_symbol(), Some(0));
6090    }
6091
6092    // ---- §3.7.2.1.1 simple code length code chooser ----
6093
6094    /// `WriteCode::as_simple_form` rejects any table that the simple form
6095    /// cannot represent verbatim: length > 1, symbol > 255, more than two
6096    /// used symbols, all-zeros table.
6097    #[test]
6098    fn simple_form_rejects_tables_outside_3_7_2_1_1_constraints() {
6099        // Three symbols → too many for simple form.
6100        let mut freq = vec![0u32; 8];
6101        freq[0] = 1;
6102        freq[1] = 1;
6103        freq[2] = 1;
6104        let three_sym = WriteCode::from_freqs(&freq);
6105        assert!(three_sym.as_simple_form().is_none());
6106
6107        // All-zero / empty alphabet → as_simple_form returns None
6108        // (encoder handles the empty case via `WriteCode::empty`).
6109        let lengths_empty = vec![0u8; 16];
6110        let codes_empty = canonical_codes(&lengths_empty);
6111        let empty_code = WriteCode {
6112            lengths: lengths_empty,
6113            codes: codes_empty,
6114            single: None,
6115        };
6116        assert!(empty_code.as_simple_form().is_none());
6117
6118        // Symbol > 255 → simple form's 8-bit symbol field can't carry it.
6119        let mut freq_big = vec![0u32; 300];
6120        freq_big[280] = 1;
6121        let beyond_255 = WriteCode::from_freqs(&freq_big);
6122        assert!(beyond_255.as_simple_form().is_none());
6123
6124        // Length > 1 → cannot be the simple form (every present symbol
6125        // must be at length 1).
6126        let mixed_lengths = vec![0u8, 2, 2, 1];
6127        let mixed_codes = canonical_codes(&mixed_lengths);
6128        let mixed = WriteCode {
6129            lengths: mixed_lengths,
6130            codes: mixed_codes,
6131            single: None,
6132        };
6133        assert!(mixed.as_simple_form().is_none());
6134    }
6135
6136    /// `WriteCode::as_simple_form` accepts the two qualifying shapes
6137    /// (1 used symbol or 2 used symbols, each at length 1).
6138    #[test]
6139    fn simple_form_accepts_one_or_two_length_one_symbols() {
6140        let mut freq1 = vec![0u32; 16];
6141        freq1[7] = 1;
6142        let one = WriteCode::from_freqs(&freq1);
6143        assert_eq!(one.as_simple_form(), Some(vec![7]));
6144
6145        let mut freq2 = vec![0u32; 16];
6146        freq2[3] = 4;
6147        freq2[12] = 4;
6148        let two = WriteCode::from_freqs(&freq2);
6149        assert_eq!(two.as_simple_form(), Some(vec![3, 12]));
6150    }
6151
6152    /// §3.7.2.1.1 exact bit-cost layout: 1 flag + 1 num + 1 width + s0 + s1.
6153    /// `simple_form_bits` must match the bytes [`write_simple_code_lengths`]
6154    /// actually emits.
6155    #[test]
6156    fn simple_form_bits_matches_written_layout() {
6157        // 1 symbol, symbol0 in [0..1] → is_first_8bits = 0 → 1-bit symbol.
6158        // Total = 1 + 1 + 1 + 1 = 4 bits.
6159        assert_eq!(simple_form_bits(&[1]), 4);
6160        // 1 symbol, symbol0 = 7 (> 1) → is_first_8bits = 1 → 8-bit symbol.
6161        // Total = 1 + 1 + 1 + 8 = 11 bits.
6162        assert_eq!(simple_form_bits(&[7]), 11);
6163        // 2 symbols, symbol0 = 0 (fits in 1 bit), symbol1 = 50.
6164        // Total = 1 + 1 + 1 + 1 + 8 = 12 bits.
6165        assert_eq!(simple_form_bits(&[0, 50]), 12);
6166        // 2 symbols, symbol0 = 200 (> 1) → 8 bits; symbol1 = 100 → 8 bits.
6167        // Total = 1 + 1 + 1 + 8 + 8 = 19 bits.
6168        assert_eq!(simple_form_bits(&[200, 100]), 19);
6169
6170        // Round-trip the byte count against an actual writer.
6171        let mut w = BitWriter::new();
6172        write_simple_code_lengths(&mut w, &[200, 100]);
6173        // 19 bits → 3 bytes (24 bits, padded). Confirm the writer's
6174        // bit-position is exactly 19.
6175        let pos_bits = w.bit_position();
6176        assert_eq!(pos_bits, 19);
6177    }
6178
6179    /// The chooser switches to the simple form for a 1-symbol distance
6180    /// code (saves ~14 bits over the normal-form single-leaf path).
6181    #[test]
6182    fn chooser_prefers_simple_form_for_empty_distance_code() {
6183        let code = WriteCode::empty(40);
6184        // Confirm normal form would have been more expensive than simple.
6185        let normal_bits = normal_form_bits(&code.lengths);
6186        let simple = code.as_simple_form().expect("empty(40) is simple-form");
6187        let simple_bits = simple_form_bits(&simple);
6188        assert!(
6189            simple_bits < normal_bits,
6190            "expected simple form (= {simple_bits} bits) to beat normal form (= {normal_bits} bits) for empty distance code"
6191        );
6192
6193        // Now drive write_code_lengths and confirm the leading flag bit is
6194        // 1 (the simple-form selector per §3.7.2.1).
6195        let mut w = BitWriter::new();
6196        code.write_code_lengths(&mut w);
6197        let bytes = w.into_bytes();
6198        let mut r = BitReader::new(&bytes);
6199        assert!(
6200            r.read_bit().expect("flag bit"),
6201            "chooser must select simple form (flag bit = 1) for the empty distance code"
6202        );
6203    }
6204
6205    /// `write_code_lengths` round-trips through the decoder for both
6206    /// branches of the chooser: a 1-symbol code (simple form) and a
6207    /// 4-symbol code (normal form).
6208    #[test]
6209    fn chooser_round_trips_through_decoder_on_both_branches() {
6210        // ---- 1-symbol path: simple form ----
6211        let mut freq = vec![0u32; 16];
6212        freq[9] = 7;
6213        let code1 = WriteCode::from_freqs(&freq);
6214        let mut w1 = BitWriter::new();
6215        code1.write_code_lengths(&mut w1);
6216        let bytes1 = w1.into_bytes();
6217        let mut r1 = BitReader::new(&bytes1);
6218        let decoded1 = PrefixCode::read(&mut r1, 16).expect("decode simple form");
6219        assert_eq!(
6220            decoded1.single_symbol(),
6221            Some(9),
6222            "decoder must recover the single-leaf symbol from the simple form"
6223        );
6224
6225        // ---- 4-symbol path: normal form ----
6226        let freq4 = vec![10u32, 4, 2, 1, 0, 0, 0, 0];
6227        let code4 = WriteCode::from_freqs(&freq4);
6228        let mut w4 = BitWriter::new();
6229        code4.write_code_lengths(&mut w4);
6230        // Emit a representative symbol sequence and round-trip it.
6231        let seq = [0usize, 1, 2, 3, 0, 0, 1, 2];
6232        for &s in &seq {
6233            code4.write_symbol(&mut w4, s);
6234        }
6235        let bytes4 = w4.into_bytes();
6236        let mut r4 = BitReader::new(&bytes4);
6237        let decoded4 = PrefixCode::read(&mut r4, 8).expect("decode normal form");
6238        for &s in &seq {
6239            assert_eq!(
6240                decoded4.read_symbol(&mut r4).expect("symbol") as usize,
6241                s,
6242                "round-trip mismatch on normal-form code"
6243            );
6244        }
6245    }
6246
6247    /// On a 1×1 opaque image the encoder produces 5 prefix codes
6248    /// (G/R/B/A + distance) and every one of them is the single-leaf
6249    /// case (one length-1 symbol, all others zero). Before round 149 the
6250    /// chooser had only the normal-form path, paying ≥ 58 bits per code
6251    /// to send the length table even though the per-symbol body
6252    /// collapses to zero. The simple-form path costs at most 11 bits
6253    /// (1-symbol header + 8-bit value), so the round-149 chooser flips
6254    /// all five codes and shrinks the encoded file by a large fraction
6255    /// on this baseline fixture.
6256    #[test]
6257    fn round_149_simple_form_shrinks_1x1_lossless_baseline() {
6258        let rgba = [0x12, 0x34, 0x56, 0xff];
6259        let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6260        eprintln!("round-149 1x1 lossless byte count: {}", file.len());
6261
6262        // Round-trip confirms the chosen stream still decodes.
6263        let decoded = crate::decode_webp(&file).unwrap();
6264        assert_eq!(decoded.frames[0].rgba, rgba);
6265
6266        // Round-148 baseline for this fixture was 174 bytes (5 prefix
6267        // codes × ≥ 58 bits each, plus container envelope). Round 149
6268        // lands at 32 bytes — a >80% reduction. Assert a conservative
6269        // strict-beat below the round-148 size.
6270        assert!(
6271            file.len() <= 48,
6272            "expected round-149 simple-form chooser to bring the 1×1 baseline well under the round-148 174-byte size; got {}",
6273            file.len()
6274        );
6275    }
6276
6277    /// Same chooser-shrink check on a 16×16 gradient. The chooser
6278    /// trade-off here applies to many of the candidate streams the
6279    /// super-chooser races: each pays substantially less header tax on
6280    /// its prefix codes when the alphabet collapses to one or two
6281    /// length-1 symbols (single-pixel column, alpha-uniform images,
6282    /// solid-color blocks, the bulk of small synthetic fixtures).
6283    #[test]
6284    fn round_149_simple_form_shrinks_synthetic_fixtures() {
6285        // 32×32 solid gray — every channel emits one literal value
6286        // repeated 1024 times. Each of the 4 literal prefix codes is a
6287        // single-leaf code → all four flip to the simple form.
6288        let mut solid = Vec::new();
6289        for _ in 0..1024 {
6290            solid.extend_from_slice(&[0x80, 0x80, 0x80, 0xff]);
6291        }
6292        let file_solid = encode_webp_lossless(&solid, 32, 32).unwrap();
6293        eprintln!("round-149 32×32 solid: {}", file_solid.len());
6294        assert!(
6295            file_solid.len() <= 100,
6296            "round-149 32×32 solid should land far below the round-148 174-byte size; got {}",
6297            file_solid.len()
6298        );
6299
6300        // 8×8 with 2 alpha values, single literal triple — RGB codes
6301        // single-leaf (one value each), alpha code two-symbol (0x80 and
6302        // 0xff). Two-symbol case may pick simple or normal depending on
6303        // the cost — the chooser picks whichever is cheaper.
6304        let mut alpha = Vec::new();
6305        for y in 0..8u32 {
6306            for x in 0..8u32 {
6307                let a = if (x + y) % 2 == 0 { 0xff } else { 0x80 };
6308                alpha.extend_from_slice(&[0x44, 0x88, 0xcc, a]);
6309            }
6310        }
6311        let file_alpha = encode_webp_lossless(&alpha, 8, 8).unwrap();
6312        eprintln!("round-149 8×8 alpha: {}", file_alpha.len());
6313        assert!(
6314            file_alpha.len() <= 110,
6315            "round-149 8×8 alpha should land below the round-148 178-byte size; got {}",
6316            file_alpha.len()
6317        );
6318
6319        // Every chosen stream still decodes byte-exact.
6320        let decoded_solid = crate::decode_webp(&file_solid).unwrap();
6321        assert_eq!(decoded_solid.frames[0].rgba, solid);
6322        let decoded_alpha = crate::decode_webp(&file_alpha).unwrap();
6323        assert_eq!(decoded_alpha.frames[0].rgba, alpha);
6324    }
6325
6326    /// Two-symbol simple-form path: when the alphabet has exactly two
6327    /// length-1 symbols, the chooser may pick simple (≤19 bits) or
6328    /// normal (≥18 bits) — whichever is cheaper. The chooser picks the
6329    /// minimum, and the chosen stream still decodes.
6330    #[test]
6331    fn round_149_two_symbol_simple_form_round_trips() {
6332        // Manually drive the chooser with a 2-symbol length-1 code.
6333        let mut freq = vec![0u32; 16];
6334        freq[2] = 5;
6335        freq[11] = 5;
6336        let code = WriteCode::from_freqs(&freq);
6337        assert_eq!(code.as_simple_form(), Some(vec![2, 11]));
6338
6339        // Confirm bit-costs are within ±1 bit of each other (the
6340        // chooser's interesting regime). Either choice round-trips.
6341        let normal_bits = normal_form_bits(&code.lengths);
6342        let simple_bits = simple_form_bits(&[2, 11]);
6343        eprintln!(
6344            "2-symbol code: simple={} bits, normal={} bits",
6345            simple_bits, normal_bits
6346        );
6347
6348        // Drive write_code_lengths through the chooser + decode.
6349        let mut w = BitWriter::new();
6350        code.write_code_lengths(&mut w);
6351        // Emit a few symbols to confirm the round-trip works.
6352        for _ in 0..3 {
6353            code.write_symbol(&mut w, 2);
6354            code.write_symbol(&mut w, 11);
6355        }
6356        let bytes = w.into_bytes();
6357        let mut r = BitReader::new(&bytes);
6358        let decoded = PrefixCode::read(&mut r, 16).expect("decode chooser output");
6359        for _ in 0..3 {
6360            assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, 2);
6361            assert_eq!(decoded.read_symbol(&mut r).unwrap() as usize, 11);
6362        }
6363    }
6364
6365    // ---- image-header ----
6366
6367    #[test]
6368    fn image_header_round_trips_through_chunk_peek() {
6369        use crate::vp8l_chunk::WebpLosslessChunk;
6370        let header = build_image_header(7, 5, true);
6371        // Append a dummy byte so the payload is long enough to peek.
6372        let mut payload = header.to_vec();
6373        payload.push(0);
6374        let h = WebpLosslessChunk::from_payload(&payload).unwrap();
6375        assert_eq!(h.width(), 7);
6376        assert_eq!(h.height(), 5);
6377        assert!(h.alpha_is_used());
6378        assert_eq!(h.version(), 0);
6379    }
6380
6381    // ---- end-to-end round trips ----
6382
6383    #[test]
6384    fn round_trip_1x1_opaque() {
6385        let rgba = [0x12, 0x34, 0x56, 0xff];
6386        let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6387        let decoded = crate::decode_webp(&file).unwrap();
6388        assert_eq!(decoded.frames[0].rgba, rgba);
6389    }
6390
6391    #[test]
6392    fn round_trip_1x1_with_alpha() {
6393        let rgba = [0xaa, 0xbb, 0xcc, 0x40];
6394        let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6395        let img = crate::decode_webp_image(&file).unwrap();
6396        assert_eq!(img.width, 1);
6397        assert_eq!(img.height, 1);
6398        assert_eq!(img.rgba, rgba);
6399    }
6400
6401    #[test]
6402    fn round_trip_small_gradient() {
6403        // 4x3 image with a spread of colors.
6404        let w = 4u32;
6405        let h = 3u32;
6406        let mut rgba = Vec::new();
6407        for y in 0..h {
6408            for x in 0..w {
6409                rgba.push((x * 60) as u8);
6410                rgba.push((y * 80) as u8);
6411                rgba.push(((x + y) * 30) as u8);
6412                rgba.push(0xff);
6413            }
6414        }
6415        let file = encode_webp_lossless(&rgba, w, h).unwrap();
6416        let decoded = crate::decode_webp(&file).unwrap();
6417        assert_eq!(decoded.frames[0].rgba, rgba);
6418    }
6419
6420    #[test]
6421    fn round_trip_solid_color_uses_single_leaf_codes() {
6422        // A solid color makes every channel a single-symbol code. The
6423        // round trip must still be exact.
6424        let w = 8u32;
6425        let h = 8u32;
6426        let mut rgba = Vec::new();
6427        for _ in 0..(w * h) {
6428            rgba.extend_from_slice(&[0x20, 0x40, 0x60, 0xff]);
6429        }
6430        let file = encode_webp_lossless(&rgba, w, h).unwrap();
6431        let decoded = crate::decode_webp(&file).unwrap();
6432        assert_eq!(decoded.frames[0].rgba, rgba);
6433    }
6434
6435    #[test]
6436    fn round_trip_larger_random_like() {
6437        // A deterministic pseudo-random pattern over a 16x16 RGBA image,
6438        // exercising all four channel codes with many distinct symbols.
6439        let w = 16u32;
6440        let h = 16u32;
6441        let mut rgba = Vec::new();
6442        let mut state = 0x1234_5678u32;
6443        for _ in 0..(w * h) {
6444            for _ in 0..4 {
6445                // xorshift
6446                state ^= state << 13;
6447                state ^= state >> 17;
6448                state ^= state << 5;
6449                rgba.push((state & 0xff) as u8);
6450            }
6451        }
6452        let file = encode_webp_lossless(&rgba, w, h).unwrap();
6453        let decoded = crate::decode_webp(&file).unwrap();
6454        assert_eq!(decoded.frames[0].rgba, rgba);
6455    }
6456
6457    #[test]
6458    fn encoded_file_walks_as_simple_lossless_container() {
6459        let rgba = [0x12, 0x34, 0x56, 0xff];
6460        let file = encode_webp_lossless(&rgba, 1, 1).unwrap();
6461        let c = crate::parse_container(&file).unwrap();
6462        assert!(c
6463            .first_chunk_with_fourcc(crate::container::fourcc::VP8L)
6464            .is_some());
6465    }
6466
6467    #[test]
6468    fn rejects_dimension_mismatch() {
6469        let rgba = [0u8; 4]; // 1 pixel
6470        match encode_webp_lossless(&rgba, 2, 2) {
6471            Err(EncodeError::PixelBufferMismatch { got, expected }) => {
6472                assert_eq!(got, 4);
6473                assert_eq!(expected, 16);
6474            }
6475            other => panic!("expected PixelBufferMismatch, got {other:?}"),
6476        }
6477    }
6478
6479    #[test]
6480    fn rejects_zero_dimensions() {
6481        match encode_webp_lossless(&[], 0, 0) {
6482            Err(EncodeError::InvalidDimensions { width, height }) => {
6483                assert_eq!(width, 0);
6484                assert_eq!(height, 0);
6485            }
6486            other => panic!("expected InvalidDimensions, got {other:?}"),
6487        }
6488    }
6489
6490    // ---- bare VP8L bitstream (encode_vp8l_argb / _with) ----
6491
6492    /// The bare bitstream wrapped in §2.6 framing equals the file
6493    /// [`encode_webp_lossless`] produces for the same pixels.
6494    #[test]
6495    fn bare_bitstream_wrapped_equals_framed_file() {
6496        // 3x2 ARGB image with a spread of colors and one non-opaque pixel.
6497        let pixels: [u32; 6] = [
6498            0xff10_2030,
6499            0xff40_5060,
6500            0x8070_8090,
6501            0xffa0_b0c0,
6502            0xffd0_e0f0,
6503            0xff00_1122,
6504        ];
6505        let bare = encode_vp8l_argb(&pixels, 3, 2).unwrap();
6506        let framed = build::build_webp_file(&bare, ImageKind::Lossless, 3, 2).unwrap();
6507
6508        // Re-derive the same file via the RGBA entry point.
6509        let mut rgba = Vec::new();
6510        for &p in &pixels {
6511            rgba.push((p >> 16) as u8);
6512            rgba.push((p >> 8) as u8);
6513            rgba.push(p as u8);
6514            rgba.push((p >> 24) as u8);
6515        }
6516        let via_rgba = encode_webp_lossless(&rgba, 3, 2).unwrap();
6517        assert_eq!(framed, via_rgba);
6518    }
6519
6520    /// A bare bitstream has no `RIFF` header — it begins with the §3.4
6521    /// `0x2F` VP8L signature byte.
6522    #[test]
6523    fn bare_bitstream_has_no_riff_wrapper() {
6524        let pixels = [0xff12_3456u32];
6525        let bare = encode_vp8l_argb(&pixels, 1, 1).unwrap();
6526        assert_ne!(&bare[0..4], b"RIFF");
6527        assert_eq!(bare[0], crate::vp8l_chunk::VP8L_SIGNATURE);
6528    }
6529
6530    /// `encode_vp8l_argb` auto-detects the §3.4 `alpha_is_used` bit.
6531    #[test]
6532    fn bare_bitstream_auto_detects_alpha() {
6533        let opaque = [0xff11_2233u32, 0xff44_5566];
6534        let bare = encode_vp8l_argb(&opaque, 2, 1).unwrap();
6535        let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6536        assert!(!h.alpha_is_used());
6537
6538        let translucent = [0x8011_2233u32, 0xff44_5566];
6539        let bare = encode_vp8l_argb(&translucent, 2, 1).unwrap();
6540        let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6541        assert!(h.alpha_is_used());
6542    }
6543
6544    /// `encode_vp8l_argb_with` forces the header bit regardless of pixels.
6545    #[test]
6546    fn bare_bitstream_with_forces_alpha_bit() {
6547        let opaque = [0xff11_2233u32];
6548        let bare = encode_vp8l_argb_with(&opaque, 1, 1, true).unwrap();
6549        let h = crate::vp8l_chunk::WebpLosslessChunk::from_payload(&bare).unwrap();
6550        assert!(h.alpha_is_used());
6551    }
6552
6553    /// The bare bitstream round-trips back to the exact pixels through the
6554    /// full decode chain once framed.
6555    #[test]
6556    fn bare_bitstream_round_trips() {
6557        let pixels: [u32; 4] = [0x80aa_bbcc, 0xff00_1122, 0xc033_4455, 0xff66_7788];
6558        let bare = encode_vp8l_argb(&pixels, 2, 2).unwrap();
6559        let framed = build::build_webp_file(&bare, ImageKind::Lossless, 2, 2).unwrap();
6560        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6561        assert_eq!(img.pixels(), &pixels);
6562    }
6563
6564    #[test]
6565    fn bare_bitstream_rejects_dimension_mismatch() {
6566        let pixels = [0xff00_0000u32]; // 1 pixel
6567        match encode_vp8l_argb(&pixels, 2, 2) {
6568            Err(EncodeError::PixelBufferMismatch { got, expected }) => {
6569                assert_eq!(got, 4);
6570                assert_eq!(expected, 16);
6571            }
6572            other => panic!("expected PixelBufferMismatch, got {other:?}"),
6573        }
6574    }
6575
6576    // ---- §5.2.2 LZ77 prefix-value inverse ----
6577
6578    /// Every value `1..=4` maps to prefix code `value - 1` with no extra
6579    /// bits, matching the `< 4` decoder branch.
6580    #[test]
6581    fn value_to_prefix_small_values_have_no_extra_bits() {
6582        for v in 1u32..=4 {
6583            let (p, e, x) = value_to_prefix(v);
6584            assert_eq!(p, v - 1);
6585            assert_eq!(e, 0);
6586            assert_eq!(x, 0);
6587        }
6588    }
6589
6590    /// Round-trip every length value `1..=MAX_MATCH` through
6591    /// [`value_to_prefix`] back into the §5.2.2 decoder formula.
6592    #[test]
6593    fn value_to_prefix_round_trips_length_range() {
6594        for v in 1u32..=MAX_MATCH as u32 {
6595            let (p, e, x) = value_to_prefix(v);
6596            // Re-apply the §5.2.2 decoder formula.
6597            let recovered = if p < 4 {
6598                p + 1
6599            } else {
6600                let extra_bits = (p - 2) >> 1;
6601                let offset = (2 + (p & 1)) << extra_bits;
6602                assert_eq!(extra_bits, e);
6603                offset + x + 1
6604            };
6605            assert_eq!(recovered, v, "value_to_prefix lost value {v}");
6606        }
6607    }
6608
6609    /// Round-trip via the live decoder helper [`crate::vp8l_decode::read_lz77_value`]
6610    /// to confirm the encoder's split is bit-compatible with what the
6611    /// decoder actually executes.
6612    #[test]
6613    fn value_to_prefix_round_trips_through_decoder() {
6614        use crate::vp8l_decode::read_lz77_value;
6615        use crate::vp8l_stream::BitReader;
6616        // A spread of values across every prefix-code band.
6617        let samples = [
6618            1u32, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 16, 17, 24, 25, 32, 100, 1000, 4096,
6619        ];
6620        for &v in &samples {
6621            let (p, e, x) = value_to_prefix(v);
6622            let mut w = BitWriter::new();
6623            if e > 0 {
6624                w.write_bits(x, e as usize);
6625            }
6626            let data = w.into_bytes();
6627            let mut r = BitReader::new(&data);
6628            let got = read_lz77_value(&mut r, p).unwrap();
6629            assert_eq!(
6630                got, v,
6631                "value {v} → prefix {p}, extra ({e}b: {x:b}) decoded as {got}"
6632            );
6633        }
6634    }
6635
6636    // ---- §5.2.2 LZ77 matcher / encoder round-trips ----
6637
6638    /// A solid-color image's pixels are a single literal followed by one
6639    /// long copy that covers the rest. Round trip must be exact.
6640    #[test]
6641    fn round_trip_solid_color_uses_lz77_copy() {
6642        let w = 32u32;
6643        let h = 32u32;
6644        let pixels = vec![0xff20_4060u32; (w * h) as usize];
6645        let tokens = tokenize_lz77(&pixels);
6646        // 1 literal + ceil((1024 - 1) / 4096) copies; for 1024 pixels: 1 + 1.
6647        let copies = tokens
6648            .iter()
6649            .filter(|t| matches!(t, Token::Copy { .. }))
6650            .count();
6651        assert!(
6652            copies >= 1,
6653            "solid-color image should emit at least one copy"
6654        );
6655        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6656        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6657        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6658        assert_eq!(img.pixels(), pixels.as_slice());
6659    }
6660
6661    /// A repeated 4-pixel pattern (cycle length 4) compresses to a long
6662    /// copy with `distance = 4`, which the §5.2.2 overlap rule
6663    /// (`distance < length`) self-replicates correctly.
6664    #[test]
6665    fn round_trip_periodic_pattern_uses_overlapping_copy() {
6666        let pattern = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
6667        let w = 16u32;
6668        let h = 4u32;
6669        let mut pixels = Vec::with_capacity((w * h) as usize);
6670        for i in 0..(w * h) {
6671            pixels.push(pattern[(i % 4) as usize]);
6672        }
6673        let tokens = tokenize_lz77(&pixels);
6674        let copies: Vec<_> = tokens
6675            .iter()
6676            .filter_map(|t| match t {
6677                Token::Copy { length, distance } => Some((*length, *distance)),
6678                _ => None,
6679            })
6680            .collect();
6681        assert!(!copies.is_empty(), "periodic pattern should emit a copy");
6682        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6683        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6684        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6685        assert_eq!(img.pixels(), pixels.as_slice());
6686    }
6687
6688    /// The §5.2.2 LZ77 path produces a strictly smaller chunk than the
6689    /// literal-only baseline on a compressible (repetitive) image. This is
6690    /// the round-119 headline measurement.
6691    #[test]
6692    fn lz77_beats_literal_only_on_repetitive_image() {
6693        // 64x64 image whose first scan-line is a small palette of distinct
6694        // colors and the remaining 63 lines copy the first line verbatim.
6695        let w = 64u32;
6696        let h = 64u32;
6697        let mut pixels = Vec::with_capacity((w * h) as usize);
6698        let palette = [
6699            0xff10_2030u32,
6700            0xff40_5060,
6701            0xff70_8090,
6702            0xffa0_b0c0,
6703            0xffd0_e0f0,
6704            0xff00_1122,
6705            0xff33_4455,
6706            0xff66_7788,
6707        ];
6708        for x in 0..w {
6709            pixels.push(palette[(x as usize) % palette.len()]);
6710        }
6711        for _ in 1..h {
6712            for x in 0..w {
6713                pixels.push(palette[(x as usize) % palette.len()]);
6714            }
6715        }
6716        let lz77 = encode_argb_literals(&pixels);
6717        let lit_only = encode_argb_literals_only(&pixels);
6718        assert!(
6719            lz77.len() < lit_only.len(),
6720            "LZ77 stream ({} B) not smaller than literal-only ({} B)",
6721            lz77.len(),
6722            lit_only.len(),
6723        );
6724        // And, more strongly, at least a 50% reduction on this case.
6725        assert!(
6726            lz77.len() * 2 < lit_only.len(),
6727            "LZ77 stream ({} B) failed to halve literal-only ({} B)",
6728            lz77.len(),
6729            lit_only.len(),
6730        );
6731
6732        // Round trip is exact.
6733        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6734        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6735        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6736        assert_eq!(img.pixels(), pixels.as_slice());
6737    }
6738
6739    /// A pixel buffer with no exploitable repetition (deterministic
6740    /// xorshift) still round-trips through the LZ77 encoder — even when
6741    /// the matcher emits no copies and the distance code stays empty.
6742    #[test]
6743    fn lz77_round_trips_incompressible_pixels() {
6744        let w = 17u32;
6745        let h = 19u32;
6746        let mut pixels = Vec::with_capacity((w * h) as usize);
6747        let mut state = 0xdead_beefu32;
6748        for _ in 0..(w * h) {
6749            state ^= state << 13;
6750            state ^= state >> 17;
6751            state ^= state << 5;
6752            pixels.push(state);
6753        }
6754        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6755        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6756        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6757        assert_eq!(img.pixels(), pixels.as_slice());
6758    }
6759
6760    // ---- §3.5.3 / §3.8.2 subtract-green forward transform ----
6761
6762    /// `apply_subtract_green` is the per-pixel inverse of
6763    /// [`crate::vp8l_transform::inverse_subtract_green`]: subtracting
6764    /// then re-adding green restores the originals, even across the
6765    /// `& 0xff` wrap.
6766    #[test]
6767    fn apply_subtract_green_is_inverse_of_inverse_subtract_green() {
6768        let mut pixels = [
6769            0xff00_0000u32, // black
6770            0xff7f_ff00,    // greenish
6771            0xffff_ffff,    // white
6772            0x8012_3456,    // mid alpha
6773            0x0001_0203,    // wrapping case: r=01, g=02, b=03
6774        ];
6775        let original = pixels;
6776        apply_subtract_green(&mut pixels);
6777        // Run the decoder's inverse and confirm we're back at the start.
6778        crate::vp8l_transform::inverse_subtract_green(&mut pixels);
6779        assert_eq!(pixels, original);
6780    }
6781
6782    /// `apply_subtract_green` preserves the green and alpha channels and
6783    /// only mutates red/blue per the §3.5.3 spec.
6784    #[test]
6785    fn apply_subtract_green_only_touches_red_and_blue() {
6786        let mut pixels = [0x80_70_60_50u32]; // a=80 r=70 g=60 b=50
6787        apply_subtract_green(&mut pixels);
6788        // a, g unchanged; r := (0x70 - 0x60) & 0xff = 0x10; b := 0xf0.
6789        assert_eq!((pixels[0] >> 24) & 0xff, 0x80);
6790        assert_eq!((pixels[0] >> 16) & 0xff, 0x10);
6791        assert_eq!((pixels[0] >> 8) & 0xff, 0x60);
6792        assert_eq!(pixels[0] & 0xff, 0xf0); // 0x50 - 0x60 = -0x10 → 0xf0
6793    }
6794
6795    /// On a synthetic natural-image-like fixture (a gradient where red and
6796    /// blue track green), the subtract-green path is strictly smaller than
6797    /// the no-transform path. This is the round-120 headline measurement.
6798    #[test]
6799    fn subtract_green_beats_no_transform_on_green_correlated_image() {
6800        // 32x32 image whose r and b channels each closely track g, so
6801        // (r - g) and (b - g) cluster tightly around 0 — exactly the
6802        // distribution §3.5.3 is designed to exploit.
6803        let w = 32u32;
6804        let h = 32u32;
6805        let mut pixels = Vec::with_capacity((w * h) as usize);
6806        let mut state = 0xC0FFEE12u32;
6807        for _ in 0..(w * h) {
6808            // xorshift-driven green; r/b are green plus small noise.
6809            state ^= state << 13;
6810            state ^= state >> 17;
6811            state ^= state << 5;
6812            let g = state & 0xff;
6813            let r = g.wrapping_add(((state >> 8) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6814            let b = g.wrapping_add(((state >> 16) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6815            pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
6816        }
6817        let no_tx = {
6818            let tokens = tokenize_lz77(&pixels);
6819            // Width-less baseline (matches `encode_argb_literals_subtract_green`
6820            // below, which also uses width=1) so the comparison isolates
6821            // the subtract-green transform from the round-130 distance-map
6822            // chooser.
6823            encode_tokens(&tokens, false, None, 1)
6824        };
6825        let sg = encode_argb_literals_subtract_green(&pixels);
6826        eprintln!(
6827            "[round-120] 32x32 green-correlated: no-tx={} B, subtract-green={} B ({:.1}% reduction)",
6828            no_tx.len(),
6829            sg.len(),
6830            100.0 * (no_tx.len() as f64 - sg.len() as f64) / no_tx.len() as f64,
6831        );
6832        assert!(
6833            sg.len() < no_tx.len(),
6834            "subtract-green ({} B) did not beat no-transform ({} B)",
6835            sg.len(),
6836            no_tx.len(),
6837        );
6838
6839        // Round trip through the full decode chain stays pixel-exact.
6840        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6841        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6842        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6843        assert_eq!(img.pixels(), pixels.as_slice());
6844    }
6845
6846    /// `encode_argb_literals` picks the smallest of the four
6847    /// `(no-tx | sg) × (no-cache | cache)` paths it evaluates, so on
6848    /// any image its output equals the minimum of all four candidate
6849    /// streams.
6850    #[test]
6851    fn encode_argb_literals_chooses_smaller_path() {
6852        let w = 32u32;
6853        let h = 32u32;
6854        let mut pixels = Vec::with_capacity((w * h) as usize);
6855        // A solid green tint with slight per-pixel red/blue noise — the
6856        // subtract-green path concentrates r and b near zero.
6857        let mut state = 0x12345678u32;
6858        for _ in 0..(w * h) {
6859            state ^= state << 13;
6860            state ^= state >> 17;
6861            state ^= state << 5;
6862            let g = 0x80u32;
6863            let r = g.wrapping_add((state & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6864            let b = g.wrapping_add(((state >> 4) & 0x0f).wrapping_sub(7) & 0xff) & 0xff;
6865            pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
6866        }
6867        let chosen = encode_argb_literals(&pixels);
6868        // `encode_argb_literals` defaults to width=1 (no distance-map
6869        // optimisation); match it for the per-option comparison.
6870        let no_tx = encode_literals_with_options(&pixels, false, None, 1);
6871        let sg = encode_literals_with_options(&pixels, true, None, 1);
6872        let cc = encode_literals_with_options(&pixels, false, Some(DEFAULT_COLOR_CACHE_BITS), 1);
6873        let sg_cc = encode_literals_with_options(&pixels, true, Some(DEFAULT_COLOR_CACHE_BITS), 1);
6874        let best = no_tx.len().min(sg.len()).min(cc.len()).min(sg_cc.len());
6875        assert_eq!(chosen.len(), best);
6876    }
6877
6878    /// A subtract-green-encoded image survives a full encode → decode
6879    /// round trip via the public entry points: the encoder writes the
6880    /// §3.8.2 transform header, the decoder reads it back and applies the
6881    /// §4.3 inverse, restoring the originals.
6882    #[test]
6883    fn subtract_green_path_round_trips_via_public_entry_points() {
6884        let w = 8u32;
6885        let h = 8u32;
6886        let pixels: Vec<u32> = (0..(w * h))
6887            .map(|i| {
6888                let g = (i * 4) & 0xff;
6889                let r = g.wrapping_add(3) & 0xff;
6890                let b = g.wrapping_sub(2) & 0xff;
6891                0xff00_0000 | (r << 16) | (g << 8) | b
6892            })
6893            .collect();
6894        // Force the subtract-green path via the test-only entry.
6895        let stream = encode_argb_literals_subtract_green(&pixels);
6896        let header = build_image_header(w, h, false);
6897        let mut payload = header.to_vec();
6898        payload.extend_from_slice(&stream);
6899        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
6900        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6901        assert_eq!(img.pixels(), pixels.as_slice());
6902    }
6903
6904    /// On a pure-noise image (no green correlation) the chooser falls
6905    /// back to the no-transform path — `encode_argb_literals` should
6906    /// never produce a stream larger than the literal-only baseline by
6907    /// applying a transform that doesn't help.
6908    #[test]
6909    fn encode_argb_literals_does_not_regress_on_uncorrelated_noise() {
6910        let w = 16u32;
6911        let h = 16u32;
6912        let mut pixels = Vec::with_capacity((w * h) as usize);
6913        let mut state = 0xDEAD_BEEFu32;
6914        for _ in 0..(w * h) {
6915            state ^= state << 13;
6916            state ^= state >> 17;
6917            state ^= state << 5;
6918            pixels.push(state | 0xff00_0000);
6919        }
6920        let chosen = encode_argb_literals(&pixels);
6921        let no_tx = {
6922            let tokens = tokenize_lz77(&pixels);
6923            // Match `encode_argb_literals`'s width-less form (width=1) so
6924            // the chooser comparison stays apples-to-apples regardless of
6925            // the round-130 distance-map optimisation.
6926            encode_tokens(&tokens, false, None, 1)
6927        };
6928        assert!(
6929            chosen.len() <= no_tx.len(),
6930            "chooser regressed: {} B with chooser vs {} B no-transform",
6931            chosen.len(),
6932            no_tx.len(),
6933        );
6934    }
6935
6936    /// A maximum-length copy (>= MAX_MATCH pixels of identical color) is
6937    /// split into consecutive §5.2.2 copies, each bounded by `MAX_MATCH`.
6938    #[test]
6939    fn round_trip_splits_match_at_max_length() {
6940        // A solid-color image with `> MAX_MATCH` pixels: the first row
6941        // is the literal source, subsequent rows are copies.
6942        let total = MAX_MATCH + 100;
6943        let pixels = vec![0xff80_8080u32; total];
6944        let tokens = tokenize_lz77(&pixels);
6945        for tok in &tokens {
6946            if let Token::Copy { length, .. } = tok {
6947                assert!(
6948                    *length <= MAX_MATCH,
6949                    "copy length {length} exceeded MAX_MATCH"
6950                );
6951            }
6952        }
6953        // Round trip via the full encoder/decoder chain (1-row image of
6954        // `total` pixels).
6955        let w = total as u32;
6956        let h = 1u32;
6957        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
6958        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
6959        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
6960        assert_eq!(img.pixels(), pixels.as_slice());
6961    }
6962
6963    // ---- §5.2.1 / §5.2.3 color cache (round 121) ----
6964
6965    /// The encoder's `EncoderColorCache` uses the spec's §5.2.3 hash
6966    /// formula and matches the decoder's
6967    /// [`crate::vp8l_decode::ColorCache::hash`] bit-for-bit at every
6968    /// allowed `code_bits`.
6969    #[test]
6970    fn encoder_color_cache_hash_matches_decoder_hash() {
6971        use crate::vp8l_decode::ColorCache;
6972        for bits in COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX {
6973            let enc = EncoderColorCache::new(bits);
6974            let dec = ColorCache::new(bits);
6975            // A spread of synthetic ARGB pixels: black, white, the
6976            // wrap-around 0x01020304, a saturated red, a mid-alpha
6977            // greenish, plus a zero (which all caches start with).
6978            for argb in [
6979                0x0000_0000u32,
6980                0xffff_ffff,
6981                0x0102_0304,
6982                0xffff_0000,
6983                0x8000_ff80,
6984                0x1234_5678,
6985            ] {
6986                assert_eq!(
6987                    enc.hash(argb),
6988                    dec.hash(argb),
6989                    "hash mismatch at code_bits={bits} for argb=0x{argb:08x}"
6990                );
6991            }
6992            assert_eq!(enc.size(), 1 << bits);
6993        }
6994    }
6995
6996    /// A fresh cache holds zeros, so `contains(0)` succeeds *before*
6997    /// any insertion — exactly the §5.2.3 "all entries set to zero"
6998    /// invariant the decoder relies on.
6999    #[test]
7000    fn encoder_color_cache_starts_zero_initialized() {
7001        let cache = EncoderColorCache::new(4);
7002        // Index 0's slot starts at the all-zero pixel.
7003        let zero_idx = cache.hash(0);
7004        assert_eq!(cache.entries[zero_idx], 0);
7005        assert_eq!(cache.contains(0), Some(zero_idx));
7006    }
7007
7008    /// Inserting a pixel makes a subsequent `contains` for that same
7009    /// pixel resolve to the matching slot; an unrelated pixel does
7010    /// not collide (with overwhelming probability at 8 cache bits).
7011    #[test]
7012    fn encoder_color_cache_insert_then_contains_round_trips() {
7013        let mut cache = EncoderColorCache::new(8);
7014        let argb = 0xff12_3456u32;
7015        assert!(cache.contains(argb).is_none() || cache.entries[cache.hash(argb)] != argb);
7016        cache.insert(argb);
7017        assert_eq!(cache.contains(argb), Some(cache.hash(argb)));
7018    }
7019
7020    /// `cacheify_tokens` converts a literal back-to-back repeat into
7021    /// a `CacheRef` token whose `index` matches the cache slot, while
7022    /// leaving the first (unique) literal as a literal.
7023    #[test]
7024    fn cacheify_tokens_collapses_repeat_literal_into_cache_ref() {
7025        let argb = 0xff20_4060u32;
7026        let pixels = vec![argb, argb];
7027        let raw = vec![Token::Literal(argb), Token::Literal(argb)];
7028        let out = cacheify_tokens(&raw, &pixels, 8);
7029        assert!(matches!(out[0], Token::Literal(p) if p == argb));
7030        let cache = EncoderColorCache::new(8);
7031        let idx = cache.hash(argb) as u32;
7032        assert_eq!(out[1], Token::CacheRef { index: idx });
7033    }
7034
7035    /// A backward-reference `Copy` token inserts each copied pixel
7036    /// into the cache, so a subsequent literal that hashes to the
7037    /// same slot is collapsed to a `CacheRef`.
7038    #[test]
7039    fn cacheify_tokens_copy_updates_cache_for_subsequent_literal() {
7040        let argb = 0xff80_4010u32;
7041        // pixels: [argb, argb, argb, argb] — represented as a literal
7042        // followed by a Copy {length: 3, distance: 1}, then later
7043        // (at position 4) we add the same argb as a literal again.
7044        let pixels = vec![argb, argb, argb, argb, argb];
7045        let raw = vec![
7046            Token::Literal(argb),
7047            Token::Copy {
7048                length: 3,
7049                distance: 1,
7050            },
7051            Token::Literal(argb),
7052        ];
7053        let out = cacheify_tokens(&raw, &pixels, 8);
7054        // The first literal is still a literal; the copy passes
7055        // through; the trailing literal is now a CacheRef.
7056        assert!(matches!(out[0], Token::Literal(p) if p == argb));
7057        assert!(matches!(
7058            out[1],
7059            Token::Copy {
7060                length: 3,
7061                distance: 1,
7062            }
7063        ));
7064        let cache = EncoderColorCache::new(8);
7065        let idx = cache.hash(argb) as u32;
7066        assert_eq!(out[2], Token::CacheRef { index: idx });
7067    }
7068
7069    /// Forcing the color-cache path on a repetitive 16-color palette
7070    /// fixture round-trips bit-exactly through the decoder. This is
7071    /// the headline round-121 sanity test: the encoder emits §5.2.3
7072    /// cache codes; the decoder reads them back via its own
7073    /// [`crate::vp8l_decode::ColorCache`] and reconstructs the same
7074    /// pixels.
7075    #[test]
7076    fn color_cache_path_round_trips_via_public_entry_points() {
7077        let w = 8u32;
7078        let h = 8u32;
7079        // 16 distinct ARGB colors cycling per scan-line; every color
7080        // appears multiple times so the cache gets exercised.
7081        let palette: [u32; 16] = [
7082            0xff00_0000,
7083            0xff00_00ff,
7084            0xff00_ff00,
7085            0xff00_ffff,
7086            0xffff_0000,
7087            0xffff_00ff,
7088            0xffff_ff00,
7089            0xffff_ffff,
7090            0xff80_8080,
7091            0xff20_4060,
7092            0xff60_4020,
7093            0xff10_2030,
7094            0xff30_2010,
7095            0xffa0_b0c0,
7096            0xffc0_b0a0,
7097            0xff55_aa55,
7098        ];
7099        let pixels: Vec<u32> = (0..(w * h))
7100            .map(|i| palette[(i as usize) % palette.len()])
7101            .collect();
7102        // Force the color-cache path via the test-only entry.
7103        let stream = encode_argb_literals_color_cache(&pixels, DEFAULT_COLOR_CACHE_BITS);
7104        let header = build_image_header(w, h, false);
7105        let mut payload = header.to_vec();
7106        payload.extend_from_slice(&stream);
7107        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
7108        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7109        assert_eq!(img.pixels(), pixels.as_slice());
7110    }
7111
7112    /// On a small palette of repeated colors (a synthetic but
7113    /// realistic case for palette-heavy artwork), the §5.2.3
7114    /// color-cache path produces a smaller stream than the
7115    /// no-cache LZ77 path. This is the round-121 headline
7116    /// measurement.
7117    #[test]
7118    fn color_cache_beats_no_cache_on_small_palette_image() {
7119        // 32x32 image where every pixel is drawn from an 8-color
7120        // palette, in a pseudo-random pattern (so the LZ77 matcher
7121        // can't collapse them all into long copies and the
7122        // color-cache codes get to do real work).
7123        let w = 32u32;
7124        let h = 32u32;
7125        let palette: [u32; 8] = [
7126            0xff10_2030,
7127            0xff40_5060,
7128            0xff70_8090,
7129            0xffa0_b0c0,
7130            0xffd0_e0f0,
7131            0xff00_1122,
7132            0xff33_4455,
7133            0xff66_7788,
7134        ];
7135        let mut pixels = Vec::with_capacity((w * h) as usize);
7136        let mut state = 0x1357_9bdfu32;
7137        for _ in 0..(w * h) {
7138            state ^= state << 13;
7139            state ^= state >> 17;
7140            state ^= state << 5;
7141            pixels.push(palette[(state as usize) % palette.len()]);
7142        }
7143        // Width-less form (matches `encode_argb_literals_color_cache`,
7144        // which also uses width=1) so the comparison isolates the
7145        // color-cache effect from the round-130 distance-map chooser.
7146        let no_cache = encode_literals_with_options(&pixels, false, None, 1);
7147        let cache = encode_literals_with_options(&pixels, false, Some(DEFAULT_COLOR_CACHE_BITS), 1);
7148        eprintln!(
7149            "[round-121] 32x32 small-palette pseudo-random: no-cache={} B, color-cache={} B ({:.1}% reduction)",
7150            no_cache.len(),
7151            cache.len(),
7152            100.0 * (no_cache.len() as f64 - cache.len() as f64) / no_cache.len() as f64,
7153        );
7154        assert!(
7155            cache.len() < no_cache.len(),
7156            "color-cache stream ({} B) did not beat no-cache LZ77 ({} B)",
7157            cache.len(),
7158            no_cache.len(),
7159        );
7160
7161        // Round trip through the full encoder/decoder chain is exact.
7162        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7163        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7164        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7165        assert_eq!(img.pixels(), pixels.as_slice());
7166    }
7167
7168    /// On a noisy image with effectively-zero color repetition the
7169    /// chooser never selects the cache path (it would just inflate
7170    /// the GREEN alphabet for no compression gain), so
7171    /// `encode_argb_literals` never produces a stream larger than the
7172    /// no-cache baseline on uncorrelated noise.
7173    #[test]
7174    fn color_cache_chooser_does_not_regress_on_uncorrelated_noise() {
7175        let w = 16u32;
7176        let h = 16u32;
7177        let mut pixels = Vec::with_capacity((w * h) as usize);
7178        let mut state = 0xfeed_b00bu32;
7179        for _ in 0..(w * h) {
7180            state ^= state << 13;
7181            state ^= state >> 17;
7182            state ^= state << 5;
7183            pixels.push(state | 0xff00_0000);
7184        }
7185        let chosen = encode_argb_literals(&pixels);
7186        // Match `encode_argb_literals`'s width=1 form so the comparison
7187        // is apples-to-apples.
7188        let no_cache_no_tx = encode_literals_with_options(&pixels, false, None, 1);
7189        assert!(
7190            chosen.len() <= no_cache_no_tx.len(),
7191            "chooser regressed on noise: {} B chosen vs {} B no-cache no-tx",
7192            chosen.len(),
7193            no_cache_no_tx.len(),
7194        );
7195    }
7196
7197    /// The §5.2.3 `color-cache-info` header field encodes the
7198    /// chosen `code_bits` value: when the cache is enabled, the
7199    /// decoder reads `%b1` followed by `ReadBits(4) = code_bits`,
7200    /// and the `ColorCacheInfo::is_enabled()` flag flips on. This
7201    /// test routes the encoded stream through the live decoder's
7202    /// `MetaPrefixHeader::read` and confirms it sees the cache.
7203    #[test]
7204    fn color_cache_header_round_trips_through_meta_prefix_reader() {
7205        use crate::meta_prefix::{ImageRole, MetaPrefixHeader};
7206        use crate::vp8l_stream::BitReader;
7207        let w = 4u32;
7208        let h = 4u32;
7209        let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
7210        let pixels: Vec<u32> = (0..(w * h))
7211            .map(|i| palette[(i as usize) % palette.len()])
7212            .collect();
7213        let stream = encode_argb_literals_color_cache(&pixels, DEFAULT_COLOR_CACHE_BITS);
7214        // Read straight off the image-stream — no §3.8.2 transform
7215        // header is present (we forced the no-tx path), so the
7216        // very first bit is the transform-list terminator `%b0`,
7217        // followed by the §3.8.3 `color-cache-info`.
7218        let mut r = BitReader::new(&stream);
7219        // Skip the transform-list terminator.
7220        assert!(!r.read_bit().unwrap());
7221        let header = MetaPrefixHeader::read(&mut r, ImageRole::Argb, w, h).unwrap();
7222        assert!(header.color_cache.is_enabled());
7223        assert_eq!(header.color_cache.code_bits, DEFAULT_COLOR_CACHE_BITS);
7224        assert_eq!(header.color_cache.size(), 1 << DEFAULT_COLOR_CACHE_BITS);
7225    }
7226
7227    // ---- round 130: §5.2.2 distance-map chooser ----
7228
7229    /// `pixel_distance_to_distance_code` reconstructs the spec's
7230    /// `xi + yi * W` for the chosen code, identical to the decoder.
7231    /// Across every distance-map entry at a fixed width, the chooser
7232    /// must pick a code that round-trips through
7233    /// `distance_code_to_pixel_distance` to the original distance.
7234    #[test]
7235    fn distance_chooser_reconstructs_each_distance_map_entry() {
7236        use crate::vp8l_decode::{distance_code_to_pixel_distance, DISTANCE_MAP};
7237        let width = 256u32;
7238        for &(xi, yi) in DISTANCE_MAP.iter() {
7239            let raw = xi + yi * width as i32;
7240            let d = if raw < 1 { 1 } else { raw as usize };
7241            let code = pixel_distance_to_distance_code(d, width);
7242            assert_eq!(
7243                distance_code_to_pixel_distance(code, width),
7244                d,
7245                "chooser code {code} for d={d} (xi={xi},yi={yi}) does not round-trip",
7246            );
7247        }
7248    }
7249
7250    /// The smallest-code early-out must produce byte-for-byte the same
7251    /// code as a full no-early-out linear scan that tracks the minimum
7252    /// matching code. The reference below re-implements the round-119
7253    /// full-scan-with-tie-break (start at the scan-line code, visit every
7254    /// one of the 120 entries, keep the smallest matching code); the
7255    /// production [`pixel_distance_to_distance_code`] returns on the first
7256    /// match. Across a representative distance range and several widths,
7257    /// both must agree on every input.
7258    #[test]
7259    fn distance_chooser_early_out_matches_full_scan() {
7260        use crate::vp8l_decode::{DISTANCE_MAP, NUM_DISTANCE_MAP_CODES};
7261
7262        // Full no-early-out linear scan with smallest-code tie-break —
7263        // the behaviour the early-out replaces. Bit-exactness against the
7264        // production chooser is what this test pins.
7265        fn full_scan(distance: usize, image_width: u32) -> u32 {
7266            let scan_line_code = distance as u32 + NUM_DISTANCE_MAP_CODES as u32;
7267            let mut best = scan_line_code;
7268            let width_i32 = image_width as i32;
7269            for (idx, &(xi, yi)) in DISTANCE_MAP.iter().enumerate() {
7270                let raw = xi + yi * width_i32;
7271                let mapped = if raw < 1 { 1 } else { raw as usize };
7272                if mapped == distance {
7273                    let candidate = (idx + 1) as u32;
7274                    if candidate < best {
7275                        best = candidate;
7276                    }
7277                }
7278            }
7279            best
7280        }
7281
7282        // Widths spanning width-1 (no spatial structure), narrow, typical
7283        // tile, and a wide row so the clamp-to-1 and large-distance
7284        // regimes are all exercised.
7285        for &width in &[1u32, 2, 16, 128, 256, 1024] {
7286            // Distance 1..=400 covers every clamp-to-1 hit, every
7287            // single-row / multi-row map distance for these widths, and
7288            // a long tail that has no map representation (scan-line
7289            // fallback). Plus a few large distances past any map reach.
7290            for distance in (1usize..=400).chain([1000, 4096, 70_000]) {
7291                assert_eq!(
7292                    pixel_distance_to_distance_code(distance, width),
7293                    full_scan(distance, width),
7294                    "early-out diverged from full scan at distance={distance} width={width}",
7295                );
7296            }
7297        }
7298    }
7299
7300    /// For a 256-wide image, pixel distance 256 (one row above) must be
7301    /// represented by distance-map code 1 ((0, 1)), not the scan-line
7302    /// code 376 (`256 + 120`). This is the headline round-130 win on
7303    /// natural images.
7304    #[test]
7305    fn distance_chooser_picks_map_code_for_row_distance() {
7306        let width = 256u32;
7307        let code = pixel_distance_to_distance_code(width as usize, width);
7308        assert_eq!(code, 1, "row distance must collapse to map code 1");
7309        // And legacy scan-line code is the bigger alternative.
7310        assert_eq!(distance_to_code(width as usize), width + 120);
7311    }
7312
7313    /// A distance with no §5.2.2 map representation at the chosen width
7314    /// falls back to the scan-line code `D + 120`. At width 256, a
7315    /// distance of 1000 has no `(xi, yi)` entry that reconstructs it, so
7316    /// the chooser emits `1000 + 120 = 1120`.
7317    #[test]
7318    fn distance_chooser_falls_back_to_scan_line_when_no_map_match() {
7319        let width = 256u32;
7320        let code = pixel_distance_to_distance_code(1000, width);
7321        assert_eq!(code, 1000 + 120);
7322    }
7323
7324    /// Width-1 (the no-spatial-structure form) admits no distance-map
7325    /// entry whose `xi + yi*1` exceeds 8+7 = 15, so any distance >= 16
7326    /// must use the scan-line form. The chooser must agree.
7327    #[test]
7328    fn distance_chooser_width_one_uses_scan_line_for_large_distances() {
7329        for d in [16usize, 32, 64, 100, 500] {
7330            assert_eq!(
7331                pixel_distance_to_distance_code(d, 1),
7332                (d as u32) + 120,
7333                "width=1 distance {d} should not collapse",
7334            );
7335        }
7336    }
7337
7338    /// On a row-correlated image (every scan-line copies the row above
7339    /// verbatim), the round-130 width-aware encoder must produce a
7340    /// strictly smaller stream than the round-119 scan-line-only form.
7341    /// This is the headline round-130 size-reduction measurement.
7342    #[test]
7343    fn width_aware_distance_beats_scan_line_only_on_row_correlated_image() {
7344        // 128x128 image whose every row is a fresh pseudo-random
7345        // 128-pixel pattern repeated for the next scan-line. The LZ77
7346        // matcher emits a single `Copy { length: ~MAX_MATCH, distance:
7347        // 128 }` per row (and chains thereafter). At width 128, distance
7348        // 128 = `(0, 1)` = distance-map code 1, far smaller than the
7349        // scan-line code 248.
7350        let w = 128u32;
7351        let h = 128u32;
7352        let mut pixels = Vec::with_capacity((w * h) as usize);
7353        let mut state = 0xC0DE_FACEu32;
7354        for _ in 0..w {
7355            state ^= state << 13;
7356            state ^= state >> 17;
7357            state ^= state << 5;
7358            pixels.push((state & 0x00ff_ffff) | 0xff00_0000);
7359        }
7360        for y in 1..h {
7361            for x in 0..w {
7362                pixels.push(pixels[(x + (y - 1) * w) as usize]);
7363            }
7364        }
7365
7366        let width_aware = encode_argb_literals_with_width(&pixels, w);
7367        let scan_line_only = encode_argb_literals(&pixels); // width=1
7368
7369        eprintln!(
7370            "[round-130] 128x128 row-correlated: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7371            scan_line_only.len(),
7372            width_aware.len(),
7373            100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7374                / scan_line_only.len() as f64,
7375        );
7376        assert!(
7377            width_aware.len() < scan_line_only.len(),
7378            "width-aware stream ({} B) not smaller than scan-line-only ({} B)",
7379            width_aware.len(),
7380            scan_line_only.len(),
7381        );
7382
7383        // Round trip is exact via the public entry point.
7384        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7385        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7386        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7387        assert_eq!(img.pixels(), pixels.as_slice());
7388    }
7389
7390    /// A photo-like fixture (smooth luma gradient + per-pixel small
7391    /// noise to fill the LZ77 hash chains) gets the round-130 chooser
7392    /// to find numerous small `(xi, yi)` matches in the §5.2.2
7393    /// distance-map neighbourhood. Compared to the width=1 scan-line
7394    /// baseline, the width-aware path is strictly smaller.
7395    #[test]
7396    fn width_aware_distance_beats_scan_line_only_on_photo_like_image() {
7397        let w = 64u32;
7398        let h = 64u32;
7399        let mut pixels = Vec::with_capacity((w * h) as usize);
7400        // Each row is a low-amplitude noise pattern around a luma ramp;
7401        // adjacent rows share the same noise seed but with a tiny offset,
7402        // so 2-D neighbour matches are abundant.
7403        let mut state = 0x1234_5678u32;
7404        for y in 0..h {
7405            let luma = (y * 4) as u8;
7406            for _x in 0..w {
7407                state ^= state << 13;
7408                state ^= state >> 17;
7409                state ^= state << 5;
7410                let n = (state & 0x07) as i32 - 3; // [-3, 4)
7411                let g = (luma as i32 + n).clamp(0, 255) as u32;
7412                let r = g;
7413                let b = g;
7414                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
7415            }
7416        }
7417        let width_aware = encode_argb_literals_with_width(&pixels, w);
7418        let scan_line_only = encode_argb_literals(&pixels);
7419        eprintln!(
7420            "[round-130] 64x64 photo-like: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7421            scan_line_only.len(),
7422            width_aware.len(),
7423            100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7424                / scan_line_only.len() as f64,
7425        );
7426        assert!(
7427            width_aware.len() <= scan_line_only.len(),
7428            "width-aware regressed: {} B vs scan-line-only {} B",
7429            width_aware.len(),
7430            scan_line_only.len(),
7431        );
7432
7433        // Round trip stays exact.
7434        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7435        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7436        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7437        assert_eq!(img.pixels(), pixels.as_slice());
7438    }
7439
7440    /// Round trip is exact across a spread of image widths. The chooser
7441    /// must never emit a distance code that reconstructs to a different
7442    /// pixel distance on the decode side.
7443    #[test]
7444    fn width_aware_round_trip_across_assorted_widths() {
7445        for &(w, h) in &[
7446            (1u32, 16u32),
7447            (3u32, 16u32),
7448            (16u32, 16u32),
7449            (97u32, 13u32),
7450            (200u32, 3u32),
7451            (256u32, 8u32),
7452        ] {
7453            let mut pixels = Vec::with_capacity((w * h) as usize);
7454            // A row-repeating pattern so the LZ77 matcher emits copies
7455            // at row-multiple distances, exercising the chooser.
7456            for y in 0..h {
7457                for x in 0..w {
7458                    let v = (x.wrapping_mul(31).wrapping_add(y)) & 0xff;
7459                    pixels.push(0xff00_0000 | (v << 16) | (v << 8) | v);
7460                }
7461            }
7462            let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7463            let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7464            let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7465            assert_eq!(
7466                img.pixels(),
7467                pixels.as_slice(),
7468                "round trip mismatch at {w}x{h}",
7469            );
7470        }
7471    }
7472
7473    /// A 64x64 image whose every row is row 0 shifted by `(y % 4) - 1`
7474    /// pixels — the resulting per-row matches are short (3-pixel-aligned
7475    /// hashes mostly), at distances clustered near `width = 64`. The
7476    /// matcher emits many small Copy tokens whose distances are 60–65
7477    /// (= 64-4..64+1), all of which the round-130 chooser collapses to
7478    /// distance-map codes 1, 3, 4 (prefix 0–2). With dozens of emissions
7479    /// the chooser's per-token saving compounds against the scan-line
7480    /// baseline (which would assign each to prefix-14 buckets).
7481    #[test]
7482    fn width_aware_distance_compounds_on_many_short_row_offset_matches() {
7483        let w = 64u32;
7484        let h = 64u32;
7485        let mut row0 = Vec::with_capacity(w as usize);
7486        let mut state = 0x1357_2468u32;
7487        for _ in 0..w {
7488            state ^= state << 13;
7489            state ^= state >> 17;
7490            state ^= state << 5;
7491            row0.push((state & 0x00ff_ffff) | 0xff00_0000);
7492        }
7493        let mut pixels = Vec::with_capacity((w * h) as usize);
7494        pixels.extend_from_slice(&row0);
7495        for y in 1..h {
7496            // Per-row 0..3 horizontal shift, ringing back into row0.
7497            let shift = (y as usize) & 0x3;
7498            for x in 0..(w as usize) {
7499                pixels.push(row0[(x + shift) % (w as usize)]);
7500            }
7501        }
7502        let width_aware = encode_argb_literals_with_width(&pixels, w);
7503        let scan_line_only = encode_argb_literals(&pixels);
7504        eprintln!(
7505            "[round-130] 64x64 row-shifted: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7506            scan_line_only.len(),
7507            width_aware.len(),
7508            100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7509                / scan_line_only.len() as f64,
7510        );
7511        assert!(
7512            width_aware.len() < scan_line_only.len(),
7513            "width-aware ({} B) not smaller than scan-line-only ({} B)",
7514            width_aware.len(),
7515            scan_line_only.len(),
7516        );
7517
7518        // Round trip stays exact via the production path.
7519        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7520        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7521        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7522        assert_eq!(img.pixels(), pixels.as_slice());
7523    }
7524
7525    /// A 256x256 row-repeating image (every scan-line a copy of row 1)
7526    /// drives the round-130 chooser to swap the scan-line code `256+120
7527    /// = 376` (prefix 16, 7 extra bits) for the map code 1 (prefix 0,
7528    /// 0 extra bits) — the largest single-emission saving the chooser
7529    /// can produce. The aggregate stream-size delta is the round-130
7530    /// headline measurement on row-correlated content.
7531    #[test]
7532    fn width_aware_distance_headline_256x256_row_repeating() {
7533        let w = 256u32;
7534        let h = 256u32;
7535        let mut pixels = Vec::with_capacity((w * h) as usize);
7536        let mut state = 0xABCD_1234u32;
7537        for _ in 0..w {
7538            state ^= state << 13;
7539            state ^= state >> 17;
7540            state ^= state << 5;
7541            pixels.push((state & 0x00ff_ffff) | 0xff00_0000);
7542        }
7543        for y in 1..h {
7544            for x in 0..w {
7545                pixels.push(pixels[(x + (y - 1) * w) as usize]);
7546            }
7547        }
7548
7549        let width_aware = encode_argb_literals_with_width(&pixels, w);
7550        let scan_line_only = encode_argb_literals(&pixels);
7551        eprintln!(
7552            "[round-130] 256x256 row-repeating: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7553            scan_line_only.len(),
7554            width_aware.len(),
7555            100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7556                / scan_line_only.len() as f64,
7557        );
7558        assert!(
7559            width_aware.len() < scan_line_only.len(),
7560            "width-aware stream ({} B) not smaller than scan-line-only ({} B)",
7561            width_aware.len(),
7562            scan_line_only.len(),
7563        );
7564
7565        // Round trip stays exact via the production path.
7566        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7567        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7568        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7569        assert_eq!(img.pixels(), pixels.as_slice());
7570    }
7571
7572    /// Re-encode an existing lossless fixture (decoded to ARGB) through
7573    /// both the width=1 scan-line-only form and the round-130 width-aware
7574    /// form, and confirm the width-aware variant is strictly smaller and
7575    /// round-trips bit-exactly. This exercises the chooser on
7576    /// non-synthetic distance distributions (the fixture's encoder
7577    /// produced whatever natural-image-style matches it found).
7578    #[test]
7579    fn width_aware_re_encode_of_real_fixture_is_smaller() {
7580        // 32x32 RGBA fixture committed in-tree (no external decode).
7581        let bytes: &[u8] = include_bytes!("../tests/data/lossless-32x32-rgba.webp");
7582        let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
7583        let w = decoded.width();
7584        let h = decoded.height();
7585        let pixels = decoded.pixels().to_vec();
7586
7587        let width_aware = encode_argb_literals_with_width(&pixels, w);
7588        let scan_line_only = encode_argb_literals(&pixels);
7589        eprintln!(
7590            "[round-130] {}x{} re-encoded fixture: scan-line-only={} B, width-aware={} B ({:.1}% reduction)",
7591            w,
7592            h,
7593            scan_line_only.len(),
7594            width_aware.len(),
7595            100.0 * (scan_line_only.len() as f64 - width_aware.len() as f64)
7596                / scan_line_only.len() as f64,
7597        );
7598        assert!(
7599            width_aware.len() <= scan_line_only.len(),
7600            "width-aware regressed: {} B vs scan-line-only {} B",
7601            width_aware.len(),
7602            scan_line_only.len(),
7603        );
7604
7605        // Round trip through the encoder + decoder is exact.
7606        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
7607        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
7608        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
7609        assert_eq!(img.pixels(), pixels.as_slice());
7610    }
7611
7612    /// The chooser must never inflate a distance: the chosen code's
7613    /// prefix code is always less than or equal to the scan-line
7614    /// alternative's prefix code, since the chooser picks the smaller
7615    /// raw code and `value_to_prefix` is monotonic in the value.
7616    #[test]
7617    fn chooser_never_picks_larger_prefix_than_scan_line() {
7618        let width = 320u32;
7619        for d in 1..=(width as usize * 4) {
7620            let chooser_code = pixel_distance_to_distance_code(d, width);
7621            let scan_code = distance_to_code(d);
7622            let (chooser_prefix, _, _) = value_to_prefix(chooser_code);
7623            let (scan_prefix, _, _) = value_to_prefix(scan_code);
7624            assert!(
7625                chooser_prefix <= scan_prefix,
7626                "d={d}: chooser code {chooser_code} (prefix {chooser_prefix}) > scan-line {scan_code} (prefix {scan_prefix})",
7627            );
7628        }
7629    }
7630
7631    // ---- round 146: §4.1 spatial-predictor forward transform ----
7632
7633    /// Round-224 cross-check (kept after the SWAR-form regression
7634    /// finding documented on the function itself): the public
7635    /// `predictor_subtract` body must remain bit-identical to the
7636    /// per-channel `wrapping_sub` semantics. Sweep 1 024 deterministic
7637    /// LCG `(original, pred)` pairs plus six hand-picked boundary
7638    /// pairs (every-channel underflow, every-channel positive,
7639    /// all-zero, all-0xff, mixed) against a verbatim copy of the
7640    /// closure-of-four reference. Acts as a regression guard so any
7641    /// future re-attempt at a SWAR / `std::simd` rewrite of this
7642    /// function can re-use this test to pin the new body against the
7643    /// reference semantics.
7644    #[test]
7645    fn predictor_subtract_matches_per_byte_reference_random() {
7646        // Verbatim copy of the closure-of-four reference body. The
7647        // published function must be bit-identical to this for every
7648        // input — this is a cross-check, not a baseline measurement.
7649        fn reference(original: u32, pred: u32) -> u32 {
7650            let a = ((original >> 24) & 0xff).wrapping_sub((pred >> 24) & 0xff) & 0xff;
7651            let r = ((original >> 16) & 0xff).wrapping_sub((pred >> 16) & 0xff) & 0xff;
7652            let g = ((original >> 8) & 0xff).wrapping_sub((pred >> 8) & 0xff) & 0xff;
7653            let b = (original & 0xff).wrapping_sub(pred & 0xff) & 0xff;
7654            (a << 24) | (r << 16) | (g << 8) | b
7655        }
7656        // Boundary cases: every-channel underflow, no-underflow, mixed.
7657        for &(orig, pred) in &[
7658            (0x0000_0000u32, 0x0000_0000u32),
7659            (0xffff_ffffu32, 0xffff_ffffu32),
7660            (0x0000_0000u32, 0xffff_ffffu32), // every channel underflows
7661            (0xffff_ffffu32, 0x0000_0000u32), // every channel saturates positive
7662            (0x10_20_30_40u32, 0x05_30_20_50u32), // mixed: r,b underflow; a,g positive
7663            (0x80_80_80_80u32, 0x80_80_80_80u32), // zero residual
7664        ] {
7665            assert_eq!(
7666                predictor_subtract(orig, pred),
7667                reference(orig, pred),
7668                "predictor_subtract diverges from per-byte reference at \
7669                 orig=0x{orig:08x} pred=0x{pred:08x}"
7670            );
7671        }
7672        let mut seed: u32 = 0xcafe_d00d;
7673        let mut rng = || {
7674            seed = seed.wrapping_mul(1_103_515_245).wrapping_add(12_345);
7675            seed
7676        };
7677        for _ in 0..1_024 {
7678            let orig = rng();
7679            let pred = rng();
7680            assert_eq!(
7681                predictor_subtract(orig, pred),
7682                reference(orig, pred),
7683                "predictor_subtract diverges from per-byte reference at \
7684                 orig=0x{orig:08x} pred=0x{pred:08x}"
7685            );
7686        }
7687    }
7688
7689    /// Round-280 cross-check for the mode-specialised block-residual
7690    /// walker: `block_mode_cost`, `block_mode_entropy_cost`, and the
7691    /// capped walks driving `pick_block_mode_with_hint` /
7692    /// `pick_block_mode_with_hint_slack` must stay bit-identical to a
7693    /// verbatim copy of the pre-round-280 per-pixel `predictor_at`
7694    /// loops. Sweeps deterministic-LCG images over shapes covering
7695    /// every walker boundary regime — 1×N / N×1 (border-only rows and
7696    /// columns), 2×2, blocks overlapping the right and bottom edges,
7697    /// blocks larger than the image, interior blocks not touching any
7698    /// border — for every mode `0..=13` plus an out-of-range mode,
7699    /// and pins the hinted pickers (whose row-granular prune must be
7700    /// pick-identical to the reference per-pixel early-out) for every
7701    /// `prefer_mode` and a slack sweep.
7702    #[test]
7703    fn block_walker_matches_predictor_at_reference_random() {
7704        // Verbatim pre-round-280 `block_mode_cost` body.
7705        #[allow(clippy::too_many_arguments)]
7706        fn ref_cost(
7707            pixels: &[u32],
7708            width: usize,
7709            height: usize,
7710            x0: usize,
7711            y0: usize,
7712            bw: usize,
7713            bh: usize,
7714            mode: u8,
7715        ) -> u64 {
7716            let mut cost: u64 = 0;
7717            for dy in 0..bh {
7718                let y = y0 + dy;
7719                if y >= height {
7720                    break;
7721                }
7722                for dx in 0..bw {
7723                    let x = x0 + dx;
7724                    if x >= width {
7725                        break;
7726                    }
7727                    let pred = predictor_at(pixels, width, x, y, mode);
7728                    let original = pixels[y * width + x];
7729                    let residual = predictor_subtract(original, pred);
7730                    cost += residual_magnitude(residual) as u64;
7731                }
7732            }
7733            cost
7734        }
7735        // Verbatim pre-round-280 `block_mode_entropy_cost` histogram
7736        // fill (the Shannon sum over it is unchanged, so comparing
7737        // the histograms pins the whole function).
7738        #[allow(clippy::too_many_arguments)]
7739        fn ref_hist(
7740            pixels: &[u32],
7741            width: usize,
7742            height: usize,
7743            x0: usize,
7744            y0: usize,
7745            bw: usize,
7746            bh: usize,
7747            mode: u8,
7748        ) -> ([[u32; 256]; 4], u32) {
7749            let mut hist: [[u32; 256]; 4] = [[0u32; 256]; 4];
7750            let mut n: u32 = 0;
7751            for dy in 0..bh {
7752                let y = y0 + dy;
7753                if y >= height {
7754                    break;
7755                }
7756                for dx in 0..bw {
7757                    let x = x0 + dx;
7758                    if x >= width {
7759                        break;
7760                    }
7761                    let pred = predictor_at(pixels, width, x, y, mode);
7762                    let original = pixels[y * width + x];
7763                    let residual = predictor_subtract(original, pred);
7764                    hist[0][((residual >> 24) & 0xff) as usize] += 1;
7765                    hist[1][((residual >> 16) & 0xff) as usize] += 1;
7766                    hist[2][((residual >> 8) & 0xff) as usize] += 1;
7767                    hist[3][(residual & 0xff) as usize] += 1;
7768                    n += 1;
7769                }
7770            }
7771            (hist, n)
7772        }
7773        let mut seed: u32 = 0x2b80_c0de;
7774        let mut rng = move || {
7775            seed = seed.wrapping_mul(1_103_515_245).wrapping_add(12_345);
7776            seed
7777        };
7778        // (width, height, x0, y0, bw, bh) — every walker regime.
7779        let shapes: &[(usize, usize, usize, usize, usize, usize)] = &[
7780            (1, 1, 0, 0, 4, 4),   // single pixel, block larger than image
7781            (1, 9, 0, 0, 4, 4),   // single column (left-column rule only)
7782            (1, 9, 0, 8, 4, 4),   // single column, partial bottom block
7783            (9, 1, 0, 0, 4, 4),   // single row (top-row rule only)
7784            (9, 1, 4, 0, 4, 4),   // single row, interior-start block
7785            (2, 2, 0, 0, 2, 2),   // smallest full-rules image
7786            (8, 8, 0, 0, 8, 8),   // block == image (all four borders)
7787            (8, 8, 4, 4, 4, 4),   // bottom-right block (TR wraparound)
7788            (8, 8, 4, 0, 4, 4),   // top-right block (top row + wraparound)
7789            (8, 8, 0, 4, 4, 4),   // bottom-left block (left column)
7790            (11, 7, 8, 4, 4, 4),  // overlaps right and bottom edges
7791            (16, 16, 4, 4, 4, 4), // pure interior block (no borders)
7792            (5, 5, 0, 0, 16, 16), // block much larger than image
7793        ];
7794        for &(width, height, x0, y0, bw, bh) in shapes {
7795            let pixels: Vec<u32> = (0..width * height).map(|_| rng()).collect();
7796            // Cost + histogram equivalence for every mode, including
7797            // one §4.1-undefined mode (predicts solid black).
7798            for mode in 0u8..=14 {
7799                assert_eq!(
7800                    block_mode_cost(&pixels, width, height, x0, y0, bw, bh, mode),
7801                    ref_cost(&pixels, width, height, x0, y0, bw, bh, mode),
7802                    "block_mode_cost diverges at {width}x{height} block \
7803                     ({x0},{y0},{bw},{bh}) mode {mode}"
7804                );
7805                let mut sink = ResidualHistogramSink {
7806                    hist: [[0u32; 256]; 4],
7807                    n: 0,
7808                };
7809                for_each_block_residual(&pixels, width, height, x0, y0, bw, bh, mode, &mut sink);
7810                let (hist, n) = ref_hist(&pixels, width, height, x0, y0, bw, bh, mode);
7811                assert_eq!(
7812                    (sink.hist, sink.n),
7813                    (hist, n),
7814                    "residual histogram diverges at {width}x{height} block \
7815                     ({x0},{y0},{bw},{bh}) mode {mode}"
7816                );
7817            }
7818            // Pick equivalence: the row-granular capped walk must
7819            // select the same mode as a reference full-cost argmin
7820            // (lowest mode wins ties) for every hint, and the slack
7821            // variant for a slack sweep.
7822            let mut ref_best_mode: u8 = 0;
7823            let mut ref_best_cost = u64::MAX;
7824            for mode in 0u8..=13 {
7825                let cost = ref_cost(&pixels, width, height, x0, y0, bw, bh, mode);
7826                if cost < ref_best_cost {
7827                    ref_best_cost = cost;
7828                    ref_best_mode = mode;
7829                }
7830            }
7831            for hint in std::iter::once(None).chain((0u8..=13).map(Some)) {
7832                let mut want = ref_best_mode;
7833                if let Some(m) = hint {
7834                    if m != want
7835                        && ref_cost(&pixels, width, height, x0, y0, bw, bh, m) == ref_best_cost
7836                    {
7837                        want = m;
7838                    }
7839                }
7840                assert_eq!(
7841                    pick_block_mode_with_hint(&pixels, width, height, x0, y0, bw, bh, hint),
7842                    want,
7843                    "hinted pick diverges at {width}x{height} block \
7844                     ({x0},{y0},{bw},{bh}) hint {hint:?}"
7845                );
7846                for slack in [0u64, 1, 7, 64] {
7847                    let mut want_slack = ref_best_mode;
7848                    if let Some(m) = hint {
7849                        if m != want_slack
7850                            && ref_cost(&pixels, width, height, x0, y0, bw, bh, m)
7851                                <= ref_best_cost.saturating_add(slack)
7852                        {
7853                            want_slack = m;
7854                        }
7855                    }
7856                    assert_eq!(
7857                        pick_block_mode_with_hint_slack(
7858                            &pixels, width, height, x0, y0, bw, bh, hint, slack
7859                        ),
7860                        want_slack,
7861                        "slack pick diverges at {width}x{height} block \
7862                         ({x0},{y0},{bw},{bh}) hint {hint:?} slack {slack}"
7863                    );
7864                }
7865            }
7866        }
7867    }
7868
7869    /// `predictor_subtract` is the per-channel mod-256 inverse of the
7870    /// decoder's `add_pred`: re-adding the same prediction recovers
7871    /// the original, regardless of which channels wrap.
7872    #[test]
7873    fn predictor_subtract_is_inverse_of_add() {
7874        let cases = [
7875            (0xff00_0000u32, 0xff00_0000u32),
7876            (0x1234_5678u32, 0x0000_0000u32),
7877            (0xff80_4020u32, 0x8040_2010u32),
7878            (0x0000_ff00u32, 0xff00_ff00u32),
7879        ];
7880        for (orig, pred) in cases {
7881            let residual = predictor_subtract(orig, pred);
7882            // Reconstruct via add_pred semantics: per-channel
7883            // wrapping_add must restore the original.
7884            let a = ((residual >> 24) & 0xff).wrapping_add((pred >> 24) & 0xff) & 0xff;
7885            let r = ((residual >> 16) & 0xff).wrapping_add((pred >> 16) & 0xff) & 0xff;
7886            let g = ((residual >> 8) & 0xff).wrapping_add((pred >> 8) & 0xff) & 0xff;
7887            let b = (residual & 0xff).wrapping_add(pred & 0xff) & 0xff;
7888            let rebuilt = (a << 24) | (r << 16) | (g << 8) | b;
7889            assert_eq!(
7890                rebuilt, orig,
7891                "subtract+add did not round-trip for orig=0x{orig:08x} pred=0x{pred:08x}"
7892            );
7893        }
7894    }
7895
7896    /// On a solid block, mode 1 (L) and mode 2 (T) both predict the
7897    /// neighbour exactly → zero residual on every channel for every
7898    /// interior pixel. `pick_block_mode` returns the lowest such
7899    /// mode by tie-breaking convention; either 0 (border-only block)
7900    /// or 1 is acceptable for the top-left block of a solid image.
7901    #[test]
7902    fn pick_block_mode_zero_cost_on_solid_block() {
7903        let w = 8usize;
7904        let h = 8usize;
7905        let pixels = vec![0xff50_6070u32; w * h];
7906        // Block covering rows 1..8, cols 1..8 — all interior except
7907        // the strip at x=0 / y=0, but those are clamped out by the
7908        // edge rules in `predictor_at`.
7909        let mode = pick_block_mode(&pixels, w, h, 0, 0, w, h);
7910        // Any mode that uses an immediate neighbour (1=L, 2=T, etc.)
7911        // produces zero residual on a constant image, so the cost
7912        // is zero; with the tie-breaker, the lowest mode wins. Mode
7913        // 0 (solid black) only matches when the image *is* solid
7914        // black — here the constant is grey, so mode 0 costs more
7915        // than 1/2/.../13, and one of those wins.
7916        assert!(mode <= 13, "mode out of range: {mode}");
7917        // Sanity: residual under the picked mode must indeed be
7918        // zero everywhere (the top-left predicts 0xff000000 → cost
7919        // 0x60 + 0x70 + 0x50 = 0xe0 fold worth, but interior pixels
7920        // dominate — total cost ≪ what mode 0 produces).
7921        let mode_cost = |m: u8| -> u64 {
7922            let mut c = 0u64;
7923            for y in 0..h {
7924                for x in 0..w {
7925                    let pred = predictor_at(&pixels, w, x, y, m);
7926                    let r = predictor_subtract(pixels[y * w + x], pred);
7927                    c += residual_magnitude(r) as u64;
7928                }
7929            }
7930            c
7931        };
7932        let picked_cost = mode_cost(mode);
7933        let mode0_cost = mode_cost(0);
7934        assert!(
7935            picked_cost < mode0_cost,
7936            "expected picked-mode cost ({picked_cost}) < mode-0 cost ({mode0_cost})"
7937        );
7938    }
7939
7940    /// Forward + inverse predictor round-trips bit-exact: applying
7941    /// the encoder's forward transform then the decoder's inverse
7942    /// transform recovers the original pixels.
7943    #[test]
7944    fn forward_predictor_round_trips_through_decoder_inverse() {
7945        use crate::vp8l_transform::inverse_predictor;
7946        let w = 16u32;
7947        let h = 16u32;
7948        // Smooth gradient — mode 7 (Average2(L, T)) should predict
7949        // most pixels well.
7950        let mut pixels = Vec::with_capacity((w * h) as usize);
7951        for y in 0..h {
7952            for x in 0..w {
7953                let r = x * 16;
7954                let g = y * 16;
7955                let b = (x + y) * 8;
7956                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
7957            }
7958        }
7959        let size_bits = 4u8; // 16x16 blocks → tw=th=1.
7960        let (pred_img, tw, _th) = build_predictor_image(&pixels, w, h, size_bits);
7961        let mut residuals = vec![0u32; pixels.len()];
7962        apply_forward_predictor(&pixels, &mut residuals, w, h, &pred_img, tw, size_bits);
7963        // Apply the decoder's inverse pass and confirm we recover
7964        // the originals.
7965        inverse_predictor(&mut residuals, w, h, &pred_img, tw, size_bits);
7966        assert_eq!(residuals, pixels);
7967    }
7968
7969    /// End-to-end: encode + decode via the public `encode_webp_lossless`
7970    /// path round-trips a smooth-gradient image bit-exactly. The
7971    /// chooser is free to pick the predictor candidate or not; the
7972    /// round-trip property must hold for *whatever* path it picks.
7973    #[test]
7974    fn round_trip_smooth_gradient_with_predictor_candidate() {
7975        let w = 32u32;
7976        let h = 32u32;
7977        let mut rgba = Vec::with_capacity((w * h * 4) as usize);
7978        for y in 0..h {
7979            for x in 0..w {
7980                rgba.push((x * 8) as u8); // r
7981                rgba.push((y * 8) as u8); // g
7982                rgba.push(((x + y) * 4) as u8); // b
7983                rgba.push(0xff); // a
7984            }
7985        }
7986        let file = encode_webp_lossless(&rgba, w, h).unwrap();
7987        let decoded = crate::decode_webp(&file).unwrap();
7988        assert_eq!(decoded.frames[0].rgba, rgba);
7989    }
7990
7991    /// On a smooth gradient the §4.1 predictor candidate should
7992    /// produce a smaller stream than the no-transform / subtract-
7993    /// green baseline: per-pixel residuals concentrate near zero,
7994    /// shrinking the green/red/blue Huffman codes. The chooser
7995    /// must select the predictor (or another equally-good
7996    /// candidate), so the final stream size is at most the
7997    /// no-tx baseline.
7998    #[test]
7999    fn predictor_path_shrinks_smooth_gradient() {
8000        let w = 64u32;
8001        let h = 64u32;
8002        let mut pixels = Vec::with_capacity((w * h) as usize);
8003        for y in 0..h {
8004            for x in 0..w {
8005                let r = (x * 4) & 0xff;
8006                let g = (y * 4) & 0xff;
8007                let b = ((x + y) * 2) & 0xff;
8008                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8009            }
8010        }
8011        // No-tx + no-cache baseline (the round-119 path).
8012        let baseline = encode_literals_with_options(&pixels, false, None, w);
8013        // The full chooser (which now includes the predictor path).
8014        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8015        eprintln!(
8016            "[round-146] {}x{} smooth gradient: no-tx baseline={} B, chooser={} B ({:.1}% reduction)",
8017            w,
8018            h,
8019            baseline.len(),
8020            chosen.len(),
8021            100.0 * (baseline.len() as f64 - chosen.len() as f64) / baseline.len() as f64,
8022        );
8023        assert!(
8024            chosen.len() <= baseline.len(),
8025            "chooser regressed on smooth gradient: {} B vs no-tx baseline {} B",
8026            chosen.len(),
8027            baseline.len(),
8028        );
8029
8030        // Round trip through the full encoder/decoder is exact.
8031        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8032        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8033        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8034        assert_eq!(img.pixels(), pixels.as_slice());
8035    }
8036
8037    /// On uncorrelated random noise the predictor never helps (no
8038    /// neighbour predicts the next pixel any better than random),
8039    /// so the chooser stays on the no-tx no-cache path (or
8040    /// subtract-green if that happens to win). The final stream
8041    /// must not regress vs the no-predictor chooser.
8042    #[test]
8043    fn predictor_chooser_does_not_regress_on_noise() {
8044        let w = 32u32;
8045        let h = 32u32;
8046        let mut pixels = Vec::with_capacity((w * h) as usize);
8047        let mut state = 0xc0ff_eeeeu32;
8048        for _ in 0..(w * h) {
8049            state ^= state << 13;
8050            state ^= state >> 17;
8051            state ^= state << 5;
8052            pixels.push(state | 0xff00_0000);
8053        }
8054        let no_predictor = encode_argb_literals_with_width(&pixels, w);
8055        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8056        assert!(
8057            chosen.len() <= no_predictor.len(),
8058            "predictor chooser regressed on noise: {} B vs {} B",
8059            chosen.len(),
8060            no_predictor.len(),
8061        );
8062    }
8063
8064    /// Round-trip the published `lossless-128x128-natural` fixture:
8065    /// decode it, re-encode via the full predictor-aware chooser,
8066    /// decode again. The decoded pixels must match the originals
8067    /// bit-exactly, and the re-encoded stream size should
8068    /// demonstrate the predictor path is being exercised on a
8069    /// natural image (we don't assert a specific size, only
8070    /// log it).
8071    #[test]
8072    fn natural_fixture_round_trips_through_predictor_aware_encoder() {
8073        let bytes: &[u8] = include_bytes!("../tests/data/lossless-128x128-natural.webp");
8074        let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
8075        let w = decoded.width();
8076        let h = decoded.height();
8077        let pixels = decoded.pixels().to_vec();
8078
8079        let pre_predictor = encode_argb_literals_with_width(&pixels, w);
8080        let with_predictor = encode_argb_with_predictor_chooser(&pixels, w, h);
8081        eprintln!(
8082            "[round-146] {}x{} natural fixture re-encoded: pre-predictor chooser={} B, predictor chooser={} B ({:.1}% reduction)",
8083            w,
8084            h,
8085            pre_predictor.len(),
8086            with_predictor.len(),
8087            100.0 * (pre_predictor.len() as f64 - with_predictor.len() as f64)
8088                / pre_predictor.len() as f64,
8089        );
8090        assert!(
8091            with_predictor.len() <= pre_predictor.len(),
8092            "predictor chooser regressed on natural fixture: {} B vs {} B",
8093            with_predictor.len(),
8094            pre_predictor.len(),
8095        );
8096
8097        // End-to-end round trip is bit-exact through the public API.
8098        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8099        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8100        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8101        assert_eq!(img.pixels(), pixels.as_slice());
8102    }
8103
8104    // ---- round 147: §3.5.2 / §4.2 color-transform forward pass ----
8105
8106    /// `color_xfrm_delta` matches the §3.5.2 formula
8107    /// `(int8(t) * int8(c)) >> 5` for both signed inputs.
8108    #[test]
8109    fn color_xfrm_delta_matches_spec_examples() {
8110        // t = -1, c = 64 → (-1 * 64) >> 5 = -2.
8111        assert_eq!(color_xfrm_delta(0xff, 0x40), -2);
8112        // t = 2, c = 64 → (2 * 64) >> 5 = 4.
8113        assert_eq!(color_xfrm_delta(2, 0x40), 4);
8114        // t = 0, c = anything → 0.
8115        assert_eq!(color_xfrm_delta(0, 0x7f), 0);
8116        // Identity case: t = 0 (no slope) ⇒ no contribution.
8117        assert_eq!(color_xfrm_delta(0, 0xff), 0);
8118    }
8119
8120    /// Forward + inverse §3.5.2 color transform round-trips per-pixel
8121    /// for arbitrary CTE values. Validates [`forward_color_pixel`]
8122    /// against the decoder's [`crate::vp8l_transform::inverse_color`]
8123    /// math.
8124    #[test]
8125    fn forward_color_pixel_round_trips_through_decoder_inverse() {
8126        use crate::vp8l_transform;
8127        let cases: &[(u8, u8, u8, u8, u8, u8)] = &[
8128            // (r, g, b, gtr, gtb, rtb)
8129            (120, 80, 200, 0x12, 0xf0, 0x05),
8130            (255, 0, 0, 0x20, 0x00, 0x00),
8131            (0, 255, 0, 0x00, 0x20, 0x00),
8132            (0, 0, 255, 0x00, 0x00, 0x20),
8133            (200, 100, 50, 0xe0, 0xd0, 0x10),
8134        ];
8135        for &(r, g, b, gtr, gtb, rtb) in cases {
8136            let (enc_r, enc_b) = forward_color_pixel(r, g, b, gtr, gtb, rtb);
8137            // Drive the decoder's helper through a 1×1 sub-image so
8138            // we exercise the actual published inverse path.
8139            let mut argb = vec![
8140                ((0xffu32) << 24) | ((enc_r as u32) << 16) | ((g as u32) << 8) | (enc_b as u32),
8141            ];
8142            // Build the §3.5.2 CTE pixel: red=rtb, green=gtb, blue=gtr.
8143            let cte = ((0xffu32) << 24) | ((rtb as u32) << 16) | ((gtb as u32) << 8) | (gtr as u32);
8144            let color_img = vec![cte];
8145            // size_bits = 9 → block 512, single block covers a 1×1 image.
8146            vp8l_transform::inverse_color(&mut argb, 1, 1, &color_img, 1, 9);
8147            assert_eq!(
8148                (argb[0] >> 16) & 0xff,
8149                r as u32,
8150                "red mismatch for r={r} g={g} b={b} gtr=0x{gtr:02x} gtb=0x{gtb:02x} rtb=0x{rtb:02x}",
8151            );
8152            assert_eq!(argb[0] & 0xff, b as u32, "blue mismatch");
8153            assert_eq!((argb[0] >> 8) & 0xff, g as u32, "green altered");
8154        }
8155    }
8156
8157    /// On a solid-color block the per-axis sweep is free to pick any
8158    /// CTE — but whichever CTE it picks must minimise the per-pixel
8159    /// folded-magnitude proxy that drove the choice. Verifying the
8160    /// picker against the all-zero baseline (which leaves residuals at
8161    /// the source's pixel values) confirms the chooser is not
8162    /// inflating cost: a constant image's red channel can still be
8163    /// "decorrelated" against the constant green if some `gtr` value
8164    /// brings `red - delta(gtr, green)` closer to zero (mod 256) than
8165    /// the raw `red`.
8166    #[test]
8167    fn pick_block_cte_is_minimum_on_solid_block() {
8168        let w = 8usize;
8169        let h = 8usize;
8170        let pixels = vec![0xff50_6070u32; w * h];
8171
8172        // Per-pixel folded-magnitude cost summed across the block, for
8173        // an arbitrary CTE.
8174        let block_cost = |gtr: u8, gtb: u8, rtb: u8| -> u64 {
8175            let mut c = 0u64;
8176            for &px in &pixels {
8177                let r = ((px >> 16) & 0xff) as u8;
8178                let g = ((px >> 8) & 0xff) as u8;
8179                let b = (px & 0xff) as u8;
8180                // Decompose like pick_block_cte does (additive across
8181                // channels): red proxy + blue proxy.
8182                let red_residual = (r as i32 - color_xfrm_delta(gtr, g)) as u32;
8183                let inter_blue = b as i32 - color_xfrm_delta(gtb, g);
8184                let blue_residual = (inter_blue - color_xfrm_delta(rtb, r)) as u32;
8185                c += channel_magnitude(red_residual) as u64;
8186                c += channel_magnitude(blue_residual) as u64;
8187            }
8188            c
8189        };
8190
8191        let (gtr, gtb, rtb) = pick_block_cte(&pixels, w, h, 0, 0, w, h);
8192        let picked_cost = block_cost(gtr, gtb, rtb);
8193        let zero_cost = block_cost(0, 0, 0);
8194        assert!(
8195            picked_cost <= zero_cost,
8196            "picked CTE (0x{gtr:02x}, 0x{gtb:02x}, 0x{rtb:02x}) cost {picked_cost} > all-zero cost {zero_cost}",
8197        );
8198    }
8199
8200    /// On a strongly green-correlated image (`red ≈ green / 2`), the
8201    /// per-axis sweep must pick a non-zero `green_to_red` to cancel
8202    /// the slope. A slope of 1/2 corresponds to a fixed-point value
8203    /// of 16 (since `>> 5` divides by 32: 16/32 = 0.5).
8204    #[test]
8205    fn pick_block_cte_recovers_known_slope() {
8206        let w = 16usize;
8207        let h = 16usize;
8208        let mut pixels = Vec::with_capacity(w * h);
8209        for y in 0..h {
8210            for x in 0..w {
8211                let g = ((x + y) * 4) as u32 & 0xff;
8212                // red = green / 2 (deterministic linear correlation):
8213                let r = (g / 2) & 0xff;
8214                // blue uncorrelated → keep at a constant so gtb/rtb
8215                // don't have a clear winner.
8216                let b = 0x80u32;
8217                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8218            }
8219        }
8220        let (gtr, _gtb, _rtb) = pick_block_cte(&pixels, w, h, 0, 0, w, h);
8221        // gtr should land on or near 16 (slope 0.5). Allow ±16 wiggle
8222        // because the grid is coarser than the optimum and the
8223        // residual-magnitude proxy is not strictly convex.
8224        let gtr_signed = gtr as i8 as i32;
8225        assert!(
8226            (0..=32).contains(&gtr_signed),
8227            "expected gtr ≈ +16 for red≈green/2 correlation, got {gtr_signed} (raw 0x{gtr:02x})",
8228        );
8229    }
8230
8231    /// Forward + inverse over a multi-block image round-trips bit-
8232    /// exactly: encoder builds the per-block color image, forward-
8233    /// transforms the pixels, decoder applies its inverse pass and
8234    /// recovers the originals.
8235    #[test]
8236    fn forward_color_round_trips_through_decoder_inverse() {
8237        use crate::vp8l_transform::inverse_color;
8238        let w = 32u32;
8239        let h = 32u32;
8240        let mut pixels = Vec::with_capacity((w * h) as usize);
8241        for y in 0..h {
8242            for x in 0..w {
8243                // Some correlation between channels (so the picker
8244                // chooses non-trivial CTEs in at least some blocks).
8245                let r = (x * 7) & 0xff;
8246                let g = (y * 5) & 0xff;
8247                let b = ((x + y) * 3) & 0xff;
8248                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8249            }
8250        }
8251        let size_bits = 4u8;
8252        let (color_img, tw, _th) =
8253            build_color_image(&pixels, w, h, size_bits, ColorTransformStrategy::L1);
8254        let mut residuals = vec![0u32; pixels.len()];
8255        apply_forward_color(&pixels, &mut residuals, w, h, &color_img, tw, size_bits);
8256        inverse_color(&mut residuals, w, h, &color_img, tw, size_bits);
8257        assert_eq!(residuals, pixels);
8258    }
8259
8260    /// End-to-end: encode + decode via the public `encode_webp_lossless`
8261    /// path round-trips a chroma-correlated image bit-exactly. The
8262    /// chooser is free to pick the color-transform candidate or not;
8263    /// the round-trip property must hold for *whatever* path it picks.
8264    #[test]
8265    fn round_trip_chroma_correlated_image_with_color_transform_candidate() {
8266        let w = 32u32;
8267        let h = 32u32;
8268        let mut rgba = Vec::with_capacity((w * h * 4) as usize);
8269        for y in 0..h {
8270            for x in 0..w {
8271                let g = ((x + y) * 4) as u8;
8272                let r = g.wrapping_div(2);
8273                let b = g.wrapping_div(3);
8274                rgba.push(r);
8275                rgba.push(g);
8276                rgba.push(b);
8277                rgba.push(0xff);
8278            }
8279        }
8280        let file = encode_webp_lossless(&rgba, w, h).unwrap();
8281        let decoded = crate::decode_webp(&file).unwrap();
8282        assert_eq!(decoded.frames[0].rgba, rgba);
8283    }
8284
8285    /// On a chroma-correlated synthetic image the §4.2 color-transform
8286    /// candidate should at worst tie the existing pre-color-transform
8287    /// chooser: even if the predictor path already wins, the chooser
8288    /// must never inflate the stream by adding the color transform as
8289    /// a new option.
8290    #[test]
8291    fn color_transform_chooser_never_regresses() {
8292        let w = 64u32;
8293        let h = 64u32;
8294        let mut pixels = Vec::with_capacity((w * h) as usize);
8295        for y in 0..h {
8296            for x in 0..w {
8297                let g = ((x + y) * 4) & 0xff;
8298                let r = (g / 2) & 0xff;
8299                let b = (g / 3) & 0xff;
8300                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
8301            }
8302        }
8303        let pre_color = pre_round_147_chooser(&pixels, w, h);
8304        let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8305        eprintln!(
8306            "[round-147] {}x{} chroma-correlated synth: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8307            w,
8308            h,
8309            pre_color.len(),
8310            with_color.len(),
8311            100.0 * (pre_color.len() as f64 - with_color.len() as f64) / pre_color.len() as f64,
8312        );
8313        assert!(
8314            with_color.len() <= pre_color.len(),
8315            "color-transform chooser regressed: {} B vs pre-color {} B",
8316            with_color.len(),
8317            pre_color.len(),
8318        );
8319
8320        // Round trip through the full encoder/decoder is exact.
8321        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8322        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8323        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8324        assert_eq!(img.pixels(), pixels.as_slice());
8325    }
8326
8327    /// Build a 128×128 channel-correlated noise fixture with
8328    /// *spatially varying* correlation slopes — each 16×16 block has a
8329    /// different `(green_to_red, green_to_blue)` correlation drawn
8330    /// from a small palette, giving the §3.5.2 per-block color
8331    /// transform a clear advantage over §3.5.3 subtract-green (which
8332    /// applies the same all-channels-equal correction everywhere).
8333    /// Within a block: spatially random green (LCG-driven), red and
8334    /// blue are `(slope × green + jitter) mod 256` in signed-mod-256
8335    /// arithmetic, with 6-bit jitter (high unique-pixel count keeps
8336    /// the §5.2.3 cache from dominating).
8337    fn make_channel_correlated_noise(w: u32, h: u32) -> Vec<u32> {
8338        let mut pixels = vec![0u32; (w * h) as usize];
8339        // Per-block (gtr, gtb) palette: four slopes giving distinct
8340        // per-block correlations so a single subtract-green delta
8341        // can't simultaneously cancel them all.
8342        let slopes: [(u32, u32); 4] = [(1, 1), (2, 2), (1, 2), (2, 1)];
8343        let block = 16u32;
8344        let bw = w.div_ceil(block);
8345        let mut state = 0x1234_5678u32;
8346        for by in 0..h.div_ceil(block) {
8347            for bx in 0..bw {
8348                let (sr, sb) = slopes[((by * bw + bx) % 4) as usize];
8349                for dy in 0..block {
8350                    let y = by * block + dy;
8351                    if y >= h {
8352                        break;
8353                    }
8354                    for dx in 0..block {
8355                        let x = bx * block + dx;
8356                        if x >= w {
8357                            break;
8358                        }
8359                        state = state.wrapping_mul(1664525).wrapping_add(1013904223);
8360                        let g = (state >> 8) & 0xff;
8361                        let jitter_r = state & 0x3f;
8362                        let jitter_b = (state >> 16) & 0x3f;
8363                        let r = (g.wrapping_mul(sr)).wrapping_add(jitter_r) & 0xff;
8364                        let b = (g.wrapping_mul(sb)).wrapping_add(jitter_b) & 0xff;
8365                        pixels[(y * w + x) as usize] = 0xff00_0000 | (r << 16) | (g << 8) | b;
8366                    }
8367                }
8368            }
8369        }
8370        pixels
8371    }
8372
8373    /// Spatially-noisy + channel-correlated synthetic fixture: full-
8374    /// entropy noise across all three channels (no spatial structure
8375    /// → predictor can't help; high unique-pixel count → §5.2.3
8376    /// color cache can't slot every pixel), but `red ≈ green / 2`
8377    /// and `blue ≈ green / 4` with a few bits of jitter (strong
8378    /// linear channel correlation → color transform should help).
8379    /// On this construction the color-transform candidate must
8380    /// *strictly* beat the round-146 chooser, exercising the new
8381    /// path end-to-end.
8382    #[test]
8383    fn color_transform_path_beats_predictor_on_channel_correlated_noise() {
8384        let w = 128u32;
8385        let h = 128u32;
8386        let pixels = make_channel_correlated_noise(w, h);
8387        let pre_color = pre_round_147_chooser(&pixels, w, h);
8388        let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8389        eprintln!(
8390            "[round-147] {}x{} channel-correlated noise: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8391            w,
8392            h,
8393            pre_color.len(),
8394            with_color.len(),
8395            100.0 * (pre_color.len() as f64 - with_color.len() as f64) / pre_color.len() as f64,
8396        );
8397        // Strict inequality: the color-transform candidate must be
8398        // chosen because the channel correlation is the only available
8399        // redundancy this fixture admits.
8400        assert!(
8401            with_color.len() < pre_color.len(),
8402            "color-transform path failed to beat the round-146 chooser on a channel-correlated-noise fixture: {} B vs {} B",
8403            with_color.len(),
8404            pre_color.len(),
8405        );
8406
8407        // Round trip through the full encoder/decoder is exact.
8408        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8409        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8410        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8411        assert_eq!(img.pixels(), pixels.as_slice());
8412    }
8413
8414    /// On uncorrelated random pixels the color transform has nothing
8415    /// to decorrelate, so the chooser must keep one of the no-transform
8416    /// / subtract-green / predictor candidates and never regress.
8417    #[test]
8418    fn color_transform_chooser_does_not_regress_on_noise() {
8419        let w = 32u32;
8420        let h = 32u32;
8421        let mut pixels = Vec::with_capacity((w * h) as usize);
8422        let mut state = 0xbadd_caf3u32;
8423        for _ in 0..(w * h) {
8424            state ^= state << 13;
8425            state ^= state >> 17;
8426            state ^= state << 5;
8427            pixels.push(state | 0xff00_0000);
8428        }
8429        let pre_color = pre_round_147_chooser(&pixels, w, h);
8430        let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8431        assert!(
8432            with_color.len() <= pre_color.len(),
8433            "color-transform chooser regressed on noise: {} B vs {} B",
8434            with_color.len(),
8435            pre_color.len(),
8436        );
8437    }
8438
8439    /// Round 308: the §4.2 entropy-cost per-block CTE chooser builds a
8440    /// color sub-image whose forward transform inverts exactly — the
8441    /// cost model only changes *which* CTE is recorded, never the
8442    /// round-trip contract. Asserted across a channel-correlated noise
8443    /// fixture (spatially varying per-block slopes) at the per-region
8444    /// `size_bits`.
8445    #[test]
8446    fn pick_block_cte_entropy_color_image_round_trips() {
8447        let w = 64u32;
8448        let h = 64u32;
8449        use crate::vp8l_transform::inverse_color;
8450        let pixels = make_channel_correlated_noise(w, h);
8451        let size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
8452        let (color_img, tw, _th) =
8453            build_color_image(&pixels, w, h, size_bits, ColorTransformStrategy::Entropy);
8454        let mut residuals = vec![0u32; pixels.len()];
8455        apply_forward_color(&pixels, &mut residuals, w, h, &color_img, tw, size_bits);
8456        inverse_color(&mut residuals, w, h, &color_img, tw, size_bits);
8457        assert_eq!(residuals, pixels);
8458    }
8459
8460    /// Round 308: on a channel-correlated noise fixture the §4.2
8461    /// entropy-cost CTE candidate must never produce a *longer* stream
8462    /// than the L1-magnitude CTE candidate at the same `size_bits` and
8463    /// cache sweep — the entropy metric scores the same candidate grid
8464    /// by the bit cost the §5.x prefix codes actually minimise, so it
8465    /// at worst ties. Both round-trip bit-exact through the decoder.
8466    #[test]
8467    fn color_transform_entropy_candidate_does_not_regress_vs_l1() {
8468        let w = 128u32;
8469        let h = 128u32;
8470        let pixels = make_channel_correlated_noise(w, h);
8471        let size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
8472
8473        let l1 = select_best_cache_bits(|cache_bits| {
8474            encode_with_color_transform_strategy(
8475                &pixels,
8476                w,
8477                h,
8478                size_bits,
8479                cache_bits,
8480                w,
8481                ColorTransformStrategy::L1,
8482            )
8483        });
8484        let entropy = select_best_cache_bits(|cache_bits| {
8485            encode_with_color_transform_strategy(
8486                &pixels,
8487                w,
8488                h,
8489                size_bits,
8490                cache_bits,
8491                w,
8492                ColorTransformStrategy::Entropy,
8493            )
8494        });
8495        eprintln!(
8496            "[round-308] {}x{} channel-correlated noise §4.2 CTE chooser: L1={} B, entropy={} B",
8497            w,
8498            h,
8499            l1.len(),
8500            entropy.len(),
8501        );
8502
8503        // Both image streams decode to the original pixels once the
8504        // §3.4 5-byte VP8L header is prepended (the candidate writers
8505        // emit the post-header image stream, exactly as the chooser's
8506        // `best` is assembled in `encode_vp8l_payload`).
8507        let header = build_image_header(w, h, false);
8508        for stream in [&l1, &entropy] {
8509            let mut bare = Vec::with_capacity(header.len() + stream.len());
8510            bare.extend_from_slice(&header);
8511            bare.extend_from_slice(stream);
8512            let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8513            let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8514            assert_eq!(img.pixels(), pixels.as_slice());
8515        }
8516
8517        // The whole-image super-chooser keeps the byte-shortest of all
8518        // candidates (L1 + entropy + every other transform path), so it
8519        // can never be longer than the L1 color-transform candidate
8520        // alone.
8521        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
8522        assert!(
8523            chosen.len() <= l1.len(),
8524            "super-chooser regressed against the L1 color-transform candidate: {} B vs {} B",
8525            chosen.len(),
8526            l1.len(),
8527        );
8528    }
8529
8530    /// Round-trip the published `lossless-128x128-natural` fixture
8531    /// through the round-147 super-chooser. The size must be at most
8532    /// the round-146 chooser's output; on a natural image the §3.5.2
8533    /// color-transform candidate's correlation cancellation usually
8534    /// shrinks the chosen stream further. Pixels round-trip bit-exact.
8535    #[test]
8536    fn natural_fixture_round_trips_through_color_transform_aware_encoder() {
8537        let bytes: &[u8] = include_bytes!("../tests/data/lossless-128x128-natural.webp");
8538        let decoded = crate::decode_lossless_image(bytes).unwrap().unwrap();
8539        let w = decoded.width();
8540        let h = decoded.height();
8541        let pixels = decoded.pixels().to_vec();
8542
8543        let pre_color = pre_round_147_chooser(&pixels, w, h);
8544        let with_color = encode_argb_with_predictor_chooser(&pixels, w, h);
8545        eprintln!(
8546            "[round-147] {}x{} natural fixture re-encoded: pre-color chooser={} B, color chooser={} B ({:.1}% reduction)",
8547            w,
8548            h,
8549            pre_color.len(),
8550            with_color.len(),
8551            100.0 * (pre_color.len() as f64 - with_color.len() as f64)
8552                / pre_color.len() as f64,
8553        );
8554        assert!(
8555            with_color.len() <= pre_color.len(),
8556            "color-transform chooser regressed on natural fixture: {} B vs {} B",
8557            with_color.len(),
8558            pre_color.len(),
8559        );
8560
8561        // End-to-end round trip is bit-exact through the public API.
8562        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8563        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8564        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8565        assert_eq!(img.pixels(), pixels.as_slice());
8566    }
8567
8568    /// Local copy of the round-146 chooser (no §4.2 color transform):
8569    /// evaluates the four
8570    /// `(no-tx | subtract-green) × (no-cache | cache)` candidates plus
8571    /// the two §4.1 predictor candidates, picking the smallest. Used
8572    /// as the regression baseline for the round-147 non-regression
8573    /// tests so they exercise *only* the color-transform delta the
8574    /// chooser added.
8575    fn pre_round_147_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
8576        let mut best = encode_argb_literals_with_width(pixels, width);
8577        let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
8578        let block = 1u32 << size_bits;
8579        if width >= block && height >= block {
8580            let candidates = [
8581                encode_with_predictor(pixels, width, height, size_bits, None, width),
8582                encode_with_predictor(
8583                    pixels,
8584                    width,
8585                    height,
8586                    size_bits,
8587                    Some(DEFAULT_COLOR_CACHE_BITS),
8588                    width,
8589                ),
8590            ];
8591            for cand in candidates {
8592                if cand.len() < best.len() {
8593                    best = cand;
8594                }
8595            }
8596        }
8597        best
8598    }
8599
8600    // ---- round 148: §5.2.3 color-cache code-bits sweep ----
8601
8602    /// Local copy of the pre-round-148 chooser for
8603    /// [`encode_argb_literals_with_width`]: hardcoded to the round-121
8604    /// `DEFAULT_COLOR_CACHE_BITS = 8` cache size for the two
8605    /// `(no-tx | subtract-green) × cache` candidates. Used by the
8606    /// round-148 regression tests to confirm that sweeping the full
8607    /// §5.2.3 `[1..11]` `cache_code_bits` range never produces a
8608    /// larger stream than the hardcoded-8 chooser.
8609    fn pre_round_148_literals_chooser(pixels: &[u32], image_width: u32) -> Vec<u8> {
8610        debug_assert!(image_width >= 1);
8611        let mut best = encode_literals_with_options(pixels, false, None, image_width);
8612        let candidates = [
8613            encode_literals_with_options(pixels, true, None, image_width),
8614            encode_literals_with_options(
8615                pixels,
8616                false,
8617                Some(DEFAULT_COLOR_CACHE_BITS),
8618                image_width,
8619            ),
8620            encode_literals_with_options(pixels, true, Some(DEFAULT_COLOR_CACHE_BITS), image_width),
8621        ];
8622        for cand in candidates {
8623            if cand.len() < best.len() {
8624                best = cand;
8625            }
8626        }
8627        best
8628    }
8629
8630    /// `select_best_cache_bits` evaluates the disabled-cache baseline
8631    /// plus all eleven §5.2.3 sizes (`code_bits ∈ [1..11]`), i.e. it
8632    /// calls the closure exactly twelve times and returns whichever
8633    /// stream is the shortest.
8634    #[test]
8635    fn select_best_cache_bits_explores_full_spec_range() {
8636        let mut calls: Vec<Option<u32>> = Vec::new();
8637        let _ = select_best_cache_bits(|bits| {
8638            calls.push(bits);
8639            // Return a stream whose length encodes the cache-bits
8640            // choice so we can verify the chooser inspects every
8641            // candidate (smallest is `Some(7)` here).
8642            let len = match bits {
8643                None => 100,
8644                Some(b) => 200 - (b as usize) * 10 + (7 - b as i32).unsigned_abs() as usize,
8645            };
8646            vec![0u8; len]
8647        });
8648        // 12 calls: None + 11 cache sizes.
8649        assert_eq!(calls.len(), 12, "expected 12 candidates");
8650        assert_eq!(calls[0], None);
8651        for (i, bits) in (COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX).enumerate() {
8652            assert_eq!(calls[i + 1], Some(bits));
8653        }
8654    }
8655
8656    /// `select_best_cache_bits` returns the smallest stream produced.
8657    #[test]
8658    fn select_best_cache_bits_returns_minimum() {
8659        // Crafted: cache_code_bits = 5 produces a 50-byte stream; all
8660        // others are larger. The sweep must return the 50-byte stream.
8661        let chosen = select_best_cache_bits(|bits| match bits {
8662            None => vec![0u8; 200],
8663            Some(5) => vec![0u8; 50],
8664            Some(b) => vec![0u8; 200 - (b as usize)],
8665        });
8666        assert_eq!(chosen.len(), 50);
8667    }
8668
8669    /// On every payload, the round-148 chooser produces a stream at
8670    /// most as large as the round-121-style hardcoded-8 chooser: the
8671    /// `cache_code_bits = 8` candidate is always among the sweep's
8672    /// twelve candidates, so the sweep can only improve.
8673    #[test]
8674    fn round_148_sweep_never_regresses_versus_hardcoded_8() {
8675        // Three contrasting payloads:
8676        // (a) small palette favouring narrow caches;
8677        // (b) wide palette favouring wide caches;
8678        // (c) random noise favouring disabled cache.
8679        let palette4: Vec<u32> = {
8680            let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
8681            let mut state = 0x1357_9bdfu32;
8682            (0..(8 * 8))
8683                .map(|_| {
8684                    state ^= state << 13;
8685                    state ^= state >> 17;
8686                    state ^= state << 5;
8687                    palette[(state as usize) % palette.len()]
8688                })
8689                .collect()
8690        };
8691        let mut wide_palette: Vec<u32> = Vec::with_capacity(32 * 32);
8692        let mut wstate = 0xabad_1deau32;
8693        for _ in 0..(32 * 32) {
8694            wstate ^= wstate << 13;
8695            wstate ^= wstate >> 17;
8696            wstate ^= wstate << 5;
8697            // 1024-color palette (10-bit truncation), opaque alpha.
8698            wide_palette.push(0xff00_0000 | (wstate & 0x3fff_3fff));
8699        }
8700        let noise: Vec<u32> = {
8701            let mut state = 0xc0de_d00du32;
8702            (0..(16 * 16))
8703                .map(|_| {
8704                    state ^= state << 13;
8705                    state ^= state >> 17;
8706                    state ^= state << 5;
8707                    state | 0xff00_0000
8708                })
8709                .collect()
8710        };
8711
8712        for (label, pixels, width) in [
8713            ("small-palette 8x8", palette4, 8u32),
8714            ("wide-palette 32x32", wide_palette, 32u32),
8715            ("noise 16x16", noise, 16u32),
8716        ] {
8717            let pre = pre_round_148_literals_chooser(&pixels, width);
8718            let post = encode_argb_literals_with_width(&pixels, width);
8719            eprintln!(
8720                "[round-148] {label}: pre={} B, post-sweep={} B",
8721                pre.len(),
8722                post.len(),
8723            );
8724            assert!(
8725                post.len() <= pre.len(),
8726                "round-148 sweep regressed on {label}: post {} B vs pre {} B",
8727                post.len(),
8728                pre.len(),
8729            );
8730        }
8731    }
8732
8733    /// On a 32×32 image whose pixels are drawn from a 16-color
8734    /// palette in a pseudo-random pattern, the round-148 sweep picks
8735    /// a `cache_code_bits` value that produces a *strictly smaller*
8736    /// stream than the hardcoded `DEFAULT_COLOR_CACHE_BITS = 8`
8737    /// choice — the four-bit difference in alphabet width pays for
8738    /// itself when the effective palette is only 16 colors.
8739    #[test]
8740    fn round_148_sweep_beats_hardcoded_8_on_small_palette() {
8741        let w = 32u32;
8742        let h = 32u32;
8743        let palette: Vec<u32> = (0..16u32)
8744            .map(|i| 0xff00_0000 | (i * 0x0011_2233))
8745            .collect();
8746        let mut pixels = Vec::with_capacity((w * h) as usize);
8747        let mut state = 0xfeed_face_u32;
8748        for _ in 0..(w * h) {
8749            state ^= state << 13;
8750            state ^= state >> 17;
8751            state ^= state << 5;
8752            pixels.push(palette[(state as usize) % palette.len()]);
8753        }
8754        let pre = pre_round_148_literals_chooser(&pixels, w);
8755        let post = encode_argb_literals_with_width(&pixels, w);
8756        eprintln!(
8757            "[round-148] small-palette 32x32: hardcoded-8={} B, sweep={} B ({:.1}% reduction)",
8758            pre.len(),
8759            post.len(),
8760            100.0 * (pre.len() as f64 - post.len() as f64) / pre.len() as f64,
8761        );
8762        assert!(
8763            post.len() < pre.len(),
8764            "expected sweep to beat hardcoded-8 on 16-color palette: post {} B vs pre {} B",
8765            post.len(),
8766            pre.len(),
8767        );
8768
8769        // Round trip through the full encoder/decoder chain is exact.
8770        let bare = encode_vp8l_argb(&pixels, w, h).unwrap();
8771        let framed = build::build_webp_file(&bare, ImageKind::Lossless, w, h).unwrap();
8772        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
8773        assert_eq!(img.pixels(), pixels.as_slice());
8774    }
8775
8776    /// Verify the round-148 sweep can pick a non-default
8777    /// `cache_code_bits` value: on at least one of several
8778    /// payloads, the sweep chooses a `code_bits` value that differs
8779    /// from the round-121 hardcoded default of `8` — proving the
8780    /// chooser is exercising the full §5.2.3 `[1..11]` range rather
8781    /// than locking to the historical fixed value.
8782    ///
8783    /// The sweep is allowed to disable the cache or pick `8` on any
8784    /// individual payload (the chooser only commits to the smallest
8785    /// stream); the assertion is that at least one of the surveyed
8786    /// payloads landed on a non-default enabled cache.
8787    #[test]
8788    fn round_148_sweep_picks_non_default_cache_bits_on_some_payload() {
8789        use crate::meta_prefix::{ImageRole, MetaPrefixHeader};
8790        use crate::vp8l_stream::BitReader;
8791
8792        // Three payloads with varying palette / size / repetition
8793        // structure. Each is run through `encode_literals_with_options`
8794        // via the round-148 sweep (no §3.8.2 transform header in front,
8795        // so the chosen stream's first bit is the optional-transform
8796        // terminator `%b0` followed directly by the §3.8.3
8797        // `color-cache-info`).
8798        let mut payloads: Vec<(u32, u32, Vec<u32>)> = Vec::new();
8799
8800        // 32x32 4-color pseudo-random palette.
8801        {
8802            let w = 32u32;
8803            let h = 32u32;
8804            let palette = [0xff10_2030u32, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0];
8805            let mut pixels = Vec::with_capacity((w * h) as usize);
8806            let mut state = 0x1357_9bdfu32;
8807            for _ in 0..(w * h) {
8808                state ^= state << 13;
8809                state ^= state >> 17;
8810                state ^= state << 5;
8811                pixels.push(palette[(state as usize) % palette.len()]);
8812            }
8813            payloads.push((w, h, pixels));
8814        }
8815
8816        // 64x64 32-color pseudo-random palette.
8817        {
8818            let w = 64u32;
8819            let h = 64u32;
8820            let palette: Vec<u32> = (0..32u32)
8821                .map(|i| 0xff00_0000 | (i * 0x0008_4210))
8822                .collect();
8823            let mut pixels = Vec::with_capacity((w * h) as usize);
8824            let mut state = 0xdead_beefu32;
8825            for _ in 0..(w * h) {
8826                state ^= state << 13;
8827                state ^= state >> 17;
8828                state ^= state << 5;
8829                pixels.push(palette[(state as usize) % palette.len()]);
8830            }
8831            payloads.push((w, h, pixels));
8832        }
8833
8834        // 64x64 256-color pseudo-random palette.
8835        {
8836            let w = 64u32;
8837            let h = 64u32;
8838            let palette: Vec<u32> = (0..256u32)
8839                .map(|i| 0xff00_0000 | (i * 0x0001_0101))
8840                .collect();
8841            let mut pixels = Vec::with_capacity((w * h) as usize);
8842            let mut state = 0xc0ff_eeefu32;
8843            for _ in 0..(w * h) {
8844                state ^= state << 13;
8845                state ^= state >> 17;
8846                state ^= state << 5;
8847                pixels.push(palette[(state as usize) % palette.len()]);
8848            }
8849            payloads.push((w, h, pixels));
8850        }
8851
8852        let mut saw_non_default_enabled = false;
8853        for (w, h, pixels) in &payloads {
8854            let chosen = select_best_cache_bits(|cache_bits| {
8855                encode_literals_with_options(pixels, false, cache_bits, *w)
8856            });
8857            let mut r = BitReader::new(&chosen);
8858            assert!(!r.read_bit().unwrap());
8859            let header = MetaPrefixHeader::read(&mut r, ImageRole::Argb, *w, *h).unwrap();
8860            if header.color_cache.is_enabled() {
8861                assert!(
8862                    (COLOR_CACHE_BITS_MIN..=COLOR_CACHE_BITS_MAX)
8863                        .contains(&header.color_cache.code_bits),
8864                    "chosen code_bits {} outside §5.2.3 [{COLOR_CACHE_BITS_MIN}..{COLOR_CACHE_BITS_MAX}]",
8865                    header.color_cache.code_bits,
8866                );
8867                eprintln!(
8868                    "[round-148] {}x{} palette payload: sweep enabled cache with code_bits={}",
8869                    w, h, header.color_cache.code_bits
8870                );
8871                if header.color_cache.code_bits != DEFAULT_COLOR_CACHE_BITS {
8872                    saw_non_default_enabled = true;
8873                }
8874            } else {
8875                eprintln!(
8876                    "[round-148] {}x{} palette payload: sweep disabled cache",
8877                    w, h
8878                );
8879            }
8880        }
8881        assert!(
8882            saw_non_default_enabled,
8883            "expected the round-148 sweep to pick a non-default code_bits on at least one payload"
8884        );
8885    }
8886
8887    // ---- round 150: §4.4 color-indexing transform encoder ----
8888
8889    /// The §4.4 color-indexing encoder derives its bundling from the
8890    /// shared threshold table: at each boundary palette size of the
8891    /// spec's "Color Table Size to Bundled Pixel Bit Width Mapping",
8892    /// the emitted bitstream's transform header parses back (via the
8893    /// §4 transform-list reader) to the expected on-wire
8894    /// `color_table_size` and the shared accessor's `width_bits`.
8895    #[test]
8896    fn encoder_color_indexing_header_matches_shared_width_bits_table() {
8897        for (n_colors, expected_bits) in [
8898            (1usize, 3u8),
8899            (2, 3),
8900            (3, 2),
8901            (4, 2),
8902            (5, 1),
8903            (16, 1),
8904            (17, 0),
8905            (256, 0),
8906        ] {
8907            // `n_colors` distinct grays, one pixel per color.
8908            let pixels: Vec<u32> = (0..n_colors as u32)
8909                .map(|i| 0xff00_0000 | (i << 16) | (i << 8) | i)
8910                .collect();
8911            let bytes = encode_with_color_indexing(&pixels, n_colors as u32, 1, None)
8912                .expect("palette path applies to <= 256 unique colors");
8913            let mut r = crate::vp8l_stream::BitReader::new(&bytes);
8914            let list = crate::vp8l_stream::TransformList::read(&mut r).unwrap();
8915            assert_eq!(
8916                list.transforms(),
8917                &[crate::vp8l_stream::Transform::ColorIndexing {
8918                    color_table_size: n_colors as u16,
8919                    width_bits: expected_bits,
8920                }],
8921                "{n_colors} colors"
8922            );
8923            assert!(list.stopped_at_entropy_body());
8924            assert_eq!(
8925                expected_bits,
8926                crate::vp8l_transform::color_indexing_width_bits(n_colors),
8927                "boundary expectation drifted from the shared accessor"
8928            );
8929        }
8930    }
8931
8932    /// `forward_color_table` is the bit-exact inverse of the decoder's
8933    /// `inverse_color_table`: applying one after the other recovers
8934    /// the original palette per-channel mod 256.
8935    #[test]
8936    fn forward_color_table_round_trips_with_decoder_inverse() {
8937        let original: Vec<u32> = vec![
8938            0xff00_0000,
8939            0xff01_0203,
8940            0xff80_4020,
8941            0x7f12_3456,
8942            0x0000_00ff,
8943        ];
8944        let mut encoded = original.clone();
8945        forward_color_table(&mut encoded);
8946        crate::vp8l_transform::inverse_color_table(&mut encoded);
8947        assert_eq!(encoded, original);
8948    }
8949
8950    /// `collect_palette` returns `None` for an image with > 256 unique
8951    /// ARGB values, and `Some((palette, map))` otherwise. The palette
8952    /// is sorted, no duplicates, and every pixel maps back via `map`.
8953    #[test]
8954    fn collect_palette_early_exits_above_256_unique_colors() {
8955        // Easy under-threshold case: 4 unique colors.
8956        let small = vec![0xff10_2030, 0xff40_5060, 0xff10_2030, 0xff70_8090];
8957        let (p, m) = collect_palette(&small).expect("4-color palette fits");
8958        assert_eq!(p.len(), 3); // 0xff10_2030 appears twice, so 3 uniques.
8959                                // Sorted.
8960        assert!(p.windows(2).all(|w| w[0] < w[1]));
8961        // Round-trip every pixel through the map.
8962        for px in &small {
8963            let idx = m[px] as usize;
8964            assert_eq!(p[idx], *px);
8965        }
8966
8967        // Over-threshold: 257 distinct colors → None.
8968        let big: Vec<u32> = (0..257u32).map(|i| 0xff00_0000 | i).collect();
8969        assert!(collect_palette(&big).is_none());
8970    }
8971
8972    /// End-to-end §4.4 color-indexing round trip through the decoder
8973    /// across the four `width_bits` regimes: a 2-color image
8974    /// (width_bits=3, 8-per-byte bundling), a 4-color image
8975    /// (width_bits=2, 4-per-byte), a 16-color image (width_bits=1,
8976    /// 2-per-byte), and a 64-color image (width_bits=0, 1-per-byte).
8977    /// Each round trip must reproduce the exact input ARGB pixels.
8978    #[test]
8979    fn color_indexing_round_trip_across_all_width_bits_regimes() {
8980        // Pseudo-random index pattern that visits every palette
8981        // entry at least once over each test image.
8982        let palette_64: Vec<u32> = (0..64u32)
8983            .map(|i| 0xff00_0000 | (i << 18) | (i << 10) | (i << 2))
8984            .collect();
8985        let scenarios: [(u32, u32, &[u32]); 4] = [
8986            // 2-color: width_bits = 3.
8987            (32, 4, &[0xff00_0000, 0xffff_ffff]),
8988            // 4-color: width_bits = 2.
8989            (16, 4, &[0xff10_2030, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0]),
8990            // 16-color: width_bits = 1. Pick non-zero palettes that
8991            // exercise the subtraction coding (varied deltas).
8992            (
8993                16,
8994                4,
8995                &[
8996                    0xff00_0000,
8997                    0xff10_2030,
8998                    0xff20_4060,
8999                    0xff30_6090,
9000                    0xff40_80c0,
9001                    0xff50_a0e0,
9002                    0xff60_c0ff,
9003                    0xff70_ff00,
9004                    0xff80_8080,
9005                    0xff90_9090,
9006                    0xffa0_a0a0,
9007                    0xffb0_b0b0,
9008                    0xffc0_c0c0,
9009                    0xffd0_d0d0,
9010                    0xffe0_e0e0,
9011                    0xfff0_f0f0,
9012                ],
9013            ),
9014            // 64-color: width_bits = 0 (no bundling).
9015            (16, 4, palette_64.as_slice()),
9016        ];
9017        for (w, h, palette) in scenarios {
9018            let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9019            let mut state: u32 = 0xC0FF_EE12;
9020            for _ in 0..(w * h) {
9021                state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9022                pixels.push(palette[(state as usize) % palette.len()]);
9023            }
9024            let stream = encode_with_color_indexing(&pixels, w, h, None)
9025                .expect("palette fits below 256 unique");
9026            // Build a complete VP8L chunk payload (5-byte header + stream)
9027            // and decode it back through the decoder.
9028            let header = build_image_header(w, h, false);
9029            let mut payload = header.to_vec();
9030            payload.extend_from_slice(&stream);
9031            let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9032                .expect("decode color-indexing round trip");
9033            assert_eq!(
9034                decoded.pixels(),
9035                pixels.as_slice(),
9036                "round-trip mismatch on {}-color palette ({}x{} image)",
9037                palette.len(),
9038                w,
9039                h
9040            );
9041        }
9042    }
9043
9044    /// Round 302: the stacked §4.4 color-indexing + §4.1 predictor
9045    /// candidate must round-trip bit-exactly through the decoder. The
9046    /// decoder reads color-indexing first (subsampling the width it
9047    /// threads into the predictor body and main image), then applies
9048    /// the inverses last-first — inverse-predictor over the bundled
9049    /// indices, then inverse-color-indexing — recovering the original
9050    /// pixels. Exercise the bundling regimes that admit a predictor
9051    /// block at the packed width (width_bits 3 / 2 / 1 / 0) so the
9052    /// `packed_width >= block` self-skip never trips for the chosen
9053    /// dimensions.
9054    #[test]
9055    fn round_302_color_indexing_predictor_round_trips_through_decoder() {
9056        let palette_64: Vec<u32> = (0..64u32)
9057            .map(|i| 0xff00_0000 | (i << 18) | (i << 10) | (i << 2))
9058            .collect();
9059        // Dimensions are chosen so `packed_width >= 16` (the default
9060        // predictor block side) and `height >= 16`, i.e. the chained
9061        // candidate produces a non-`None` stream.
9062        let scenarios: [(u32, u32, &[u32]); 4] = [
9063            // 2-color: width_bits = 3 → packed_width = ceil(W/8).
9064            (256, 32, &[0xff00_0000, 0xffff_ffff]),
9065            // 4-color: width_bits = 2 → packed_width = ceil(W/4).
9066            (
9067                128,
9068                32,
9069                &[0xff10_2030, 0xff40_5060, 0xff70_8090, 0xffa0_b0c0],
9070            ),
9071            // 16-color: width_bits = 1 → packed_width = ceil(W/2).
9072            (
9073                64,
9074                32,
9075                &[
9076                    0xff00_0000,
9077                    0xff10_2030,
9078                    0xff20_4060,
9079                    0xff30_6090,
9080                    0xff40_80c0,
9081                    0xff50_a0e0,
9082                    0xff60_c0ff,
9083                    0xff70_ff00,
9084                    0xff80_8080,
9085                    0xff90_9090,
9086                    0xffa0_a0a0,
9087                    0xffb0_b0b0,
9088                    0xffc0_c0c0,
9089                    0xffd0_d0d0,
9090                    0xffe0_e0e0,
9091                    0xfff0_f0f0,
9092                ],
9093            ),
9094            // 64-color: width_bits = 0 (no bundling) → packed_width = W.
9095            (32, 32, palette_64.as_slice()),
9096        ];
9097        for (w, h, palette) in scenarios {
9098            let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9099            // Row-coherent fill: each row is a smooth run over palette
9100            // indices so the predictor over the bundled bytes has real
9101            // spatial structure to model.
9102            for y in 0..h {
9103                for x in 0..w {
9104                    let idx = ((x / 3 + y) as usize) % palette.len();
9105                    pixels.push(palette[idx]);
9106                }
9107            }
9108            let stream = encode_with_color_indexing_predictor(
9109                &pixels,
9110                w,
9111                h,
9112                DEFAULT_PREDICTOR_SIZE_BITS,
9113                None,
9114                PredictorSubImageStrategy::L1,
9115            )
9116            .expect("palette feasible and packed image admits a predictor block");
9117            let header = build_image_header(w, h, false);
9118            let mut payload = header.to_vec();
9119            payload.extend_from_slice(&stream);
9120            let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9121                .expect("decode color-indexing+predictor round trip");
9122            assert_eq!(
9123                decoded.pixels(),
9124                pixels.as_slice(),
9125                "round-trip mismatch on {}-color palette ({}x{} image)",
9126                palette.len(),
9127                w,
9128                h
9129            );
9130        }
9131    }
9132
9133    /// Round 302: the stacked candidate must also round-trip with a
9134    /// §5.2.3 color cache enabled over the residual stream, and at the
9135    /// single-block predictor `size_bits` the chooser also sweeps.
9136    #[test]
9137    fn round_302_color_indexing_predictor_round_trips_with_cache_and_single_block() {
9138        let palette = [0xff00_0000u32, 0xff20_4060, 0xff60_c0ff, 0xffff_ffff];
9139        let (w, h) = (96u32, 48u32);
9140        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9141        for y in 0..h {
9142            for x in 0..w {
9143                let idx = ((x / 5 + y / 2) as usize) % palette.len();
9144                pixels.push(palette[idx]);
9145            }
9146        }
9147        // Single-block size_bits large enough to collapse the packed
9148        // image into one predictor block.
9149        // Round 305: also sweep the predictor-sub-image strategy so the
9150        // entropy / sub-image-aware builders are exercised on the
9151        // packed-index residual round-trip.
9152        for size_bits in [DEFAULT_PREDICTOR_SIZE_BITS, 7u8] {
9153            for cache in [None, Some(4u32), Some(8u32)] {
9154                for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9155                    if let Some(stream) = encode_with_color_indexing_predictor(
9156                        &pixels,
9157                        w,
9158                        h,
9159                        size_bits,
9160                        cache,
9161                        pred_strategy,
9162                    ) {
9163                        let header = build_image_header(w, h, false);
9164                        let mut payload = header.to_vec();
9165                        payload.extend_from_slice(&stream);
9166                        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9167                            .expect("decode chained round trip");
9168                        assert_eq!(
9169                            decoded.pixels(),
9170                            pixels.as_slice(),
9171                            "mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9172                        );
9173                    }
9174                }
9175            }
9176        }
9177    }
9178
9179    /// Round 302: the chained candidate self-skips (returns `None`)
9180    /// when the packed image is smaller than one predictor block at the
9181    /// requested `size_bits`, so the chooser never emits a degenerate
9182    /// stream. A 4-pixel-wide, 2-color image packs to a 1-byte-wide
9183    /// bundled image (width_bits = 3), which is below the default
9184    /// 16-pixel predictor block.
9185    #[test]
9186    fn round_302_color_indexing_predictor_skips_subblock_packed_image() {
9187        let pixels = [
9188            0xff00_0000u32,
9189            0xffff_ffff,
9190            0xff00_0000,
9191            0xffff_ffff,
9192            0xffff_ffff,
9193            0xff00_0000,
9194            0xffff_ffff,
9195            0xff00_0000,
9196        ];
9197        // 4x2 image: width_bits = 3 → packed_width = 1 < block (16).
9198        assert!(
9199            encode_with_color_indexing_predictor(
9200                &pixels,
9201                4,
9202                2,
9203                DEFAULT_PREDICTOR_SIZE_BITS,
9204                None,
9205                PredictorSubImageStrategy::L1
9206            )
9207            .is_none(),
9208            "sub-block packed image must self-skip the predictor chain"
9209        );
9210    }
9211
9212    /// Round 302: the full super-chooser stays non-regressing with the
9213    /// new stacked candidate in the mix — the chosen stream is never
9214    /// larger than the best of the pre-302 single-transform candidates,
9215    /// and it still decodes back to the source pixels. Exercised on a
9216    /// row-coherent palette image where the chained transform has a
9217    /// real opportunity to win.
9218    #[test]
9219    fn round_302_chooser_never_regresses_and_round_trips() {
9220        let palette = [
9221            0xff00_0000u32,
9222            0xff20_4060,
9223            0xff40_80c0,
9224            0xff60_c0ff,
9225            0xff80_8080,
9226            0xffff_ffff,
9227        ];
9228        let (w, h) = (128u32, 64u32);
9229        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9230        for y in 0..h {
9231            for x in 0..w {
9232                // Smooth diagonal ramp through the palette → strong
9233                // spatial coherence in the bundled-index image.
9234                let idx = (((x + y) / 4) as usize) % palette.len();
9235                pixels.push(palette[idx]);
9236            }
9237        }
9238
9239        // Pre-302 best single-transform candidate: the single color-
9240        // indexing path (with the cache sweep) is the strongest
9241        // single-transform option on this palette image.
9242        let single_ci = select_best_cache_bits(|cache_bits| {
9243            encode_with_color_indexing(&pixels, w, h, cache_bits).expect("palette feasible")
9244        });
9245
9246        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9247        assert!(
9248            chosen.len() <= single_ci.len(),
9249            "chooser regressed: chosen {} > single color-indexing {}",
9250            chosen.len(),
9251            single_ci.len()
9252        );
9253
9254        // The chosen stream must decode back to the source pixels.
9255        let header = build_image_header(w, h, false);
9256        let mut payload = header.to_vec();
9257        payload.extend_from_slice(&chosen);
9258        let decoded =
9259            crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9260        assert_eq!(decoded.pixels(), pixels.as_slice());
9261    }
9262
9263    /// Round 303: the stacked §4.2 color-transform + §4.1 predictor
9264    /// candidate must round-trip bit-exactly through the decoder. The
9265    /// decoder reads color-transform first, predictor second (neither
9266    /// subsamples the width), then applies the inverses last-first —
9267    /// inverse-predictor over the color-transformed image, then
9268    /// inverse-color — recovering the original pixels. Exercise photo-
9269    /// like content across a default and single-block `size_bits`, with
9270    /// and without a residual-stream color cache.
9271    #[test]
9272    fn round_303_color_transform_predictor_round_trips_through_decoder() {
9273        // Synthetic photo-like content: smooth channel gradients plus a
9274        // deterministic noise term, so red / blue carry real correlation
9275        // against green (the §4.2 transform has something to model) and
9276        // the gradients give the §4.1 predictor real spatial structure.
9277        let (w, h) = (128u32, 96u32);
9278        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9279        let mut state: u32 = 0x1234_5678;
9280        for y in 0..h {
9281            for x in 0..w {
9282                state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9283                let n = (state >> 27) as i32 - 16;
9284                let g = ((x + y) % 256) as i32;
9285                let r = (g + 24 + n).clamp(0, 255) as u32;
9286                let b = (g - 18 - n).clamp(0, 255) as u32;
9287                pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9288            }
9289        }
9290        // Round 305: sweep the predictor-sub-image strategy too.
9291        for size_bits in [DEFAULT_COLOR_TRANSFORM_SIZE_BITS, 7u8] {
9292            for cache in [None, Some(4u32), Some(9u32)] {
9293                for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9294                    let stream = encode_with_color_transform_predictor(
9295                        &pixels,
9296                        w,
9297                        h,
9298                        size_bits,
9299                        cache,
9300                        pred_strategy,
9301                    );
9302                    let header = build_image_header(w, h, false);
9303                    let mut payload = header.to_vec();
9304                    payload.extend_from_slice(&stream);
9305                    let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9306                        .expect("decode color-transform+predictor round trip");
9307                    assert_eq!(
9308                        decoded.pixels(),
9309                        pixels.as_slice(),
9310                        "round-trip mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9311                    );
9312                }
9313            }
9314        }
9315    }
9316
9317    /// Round 303: a single-row / single-column-degenerate image still
9318    /// round-trips. With `width < block` the chooser never calls the
9319    /// chained path, but the encoder itself must still produce a valid
9320    /// stream when handed a one-block image (the smallest admissible
9321    /// input: exactly `block × block`).
9322    #[test]
9323    fn round_303_color_transform_predictor_single_block_round_trips() {
9324        let block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9325        let (w, h) = (block, block);
9326        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9327        for y in 0..h {
9328            for x in 0..w {
9329                let g = (x * 7 + y * 5) % 256;
9330                let r = (g + 13) % 256;
9331                let b = (g + 200) % 256;
9332                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9333            }
9334        }
9335        let stream = encode_with_color_transform_predictor(
9336            &pixels,
9337            w,
9338            h,
9339            DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9340            None,
9341            PredictorSubImageStrategy::L1,
9342        );
9343        let header = build_image_header(w, h, false);
9344        let mut payload = header.to_vec();
9345        payload.extend_from_slice(&stream);
9346        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9347            .expect("decode single-block chain");
9348        assert_eq!(decoded.pixels(), pixels.as_slice());
9349    }
9350
9351    /// Round 303: the full super-chooser stays non-regressing with the
9352    /// new color-transform + predictor candidate in the mix — the chosen
9353    /// stream is never larger than the best of the pre-303 candidates,
9354    /// and it still decodes back to the source pixels. Exercised on a
9355    /// photo-like image where the chained transform has a real chance to
9356    /// win against the single color-transform and single predictor paths.
9357    #[test]
9358    fn round_303_chooser_never_regresses_and_round_trips() {
9359        let (w, h) = (160u32, 120u32);
9360        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9361        let mut state: u32 = 0xfeed_face;
9362        for y in 0..h {
9363            for x in 0..w {
9364                state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9365                let n = (state >> 28) as i32 - 8;
9366                let g = ((x as i32) - (y as i32)).rem_euclid(256);
9367                let r = (g + 40 + n).clamp(0, 255) as u32;
9368                let b = (g - 30 + n).clamp(0, 255) as u32;
9369                pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9370            }
9371        }
9372
9373        // Pre-303 best of the two single block-transform paths on this
9374        // photo-like image (predictor + color transform, each with the
9375        // cache sweep).
9376        let single_pred = select_best_cache_bits(|cache_bits| {
9377            encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
9378        });
9379        let single_color = select_best_cache_bits(|cache_bits| {
9380            encode_with_color_transform(
9381                &pixels,
9382                w,
9383                h,
9384                DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9385                cache_bits,
9386                w,
9387            )
9388        });
9389        let pre303 = single_pred.len().min(single_color.len());
9390
9391        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9392        assert!(
9393            chosen.len() <= pre303,
9394            "chooser regressed: chosen {} > pre-303 best {}",
9395            chosen.len(),
9396            pre303
9397        );
9398
9399        let header = build_image_header(w, h, false);
9400        let mut payload = header.to_vec();
9401        payload.extend_from_slice(&chosen);
9402        let decoded =
9403            crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9404        assert_eq!(decoded.pixels(), pixels.as_slice());
9405    }
9406
9407    /// Round 304: the three-transform §4.2 color → §4.3 subtract-green →
9408    /// §4.1 predictor stack must round-trip bit-exactly through the
9409    /// decoder. The decoder reads color first, subtract-green second,
9410    /// predictor third (none subsample the width), then applies the
9411    /// inverses last-first — inverse-predictor, inverse-subtract-green,
9412    /// inverse-color — recovering the original pixels. Exercise photo-like
9413    /// content across a default and single-block `size_bits`, with and
9414    /// without a residual-stream color cache.
9415    #[test]
9416    fn round_304_color_subtract_green_predictor_round_trips_through_decoder() {
9417        let (w, h) = (128u32, 96u32);
9418        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9419        let mut state: u32 = 0x0bad_c0de;
9420        for y in 0..h {
9421            for x in 0..w {
9422                state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9423                let n = (state >> 27) as i32 - 16;
9424                let g = ((x + y) % 256) as i32;
9425                let r = (g + 31 + n).clamp(0, 255) as u32;
9426                let b = (g - 22 - n).clamp(0, 255) as u32;
9427                pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9428            }
9429        }
9430        // Round 305: sweep the predictor-sub-image strategy too.
9431        for size_bits in [DEFAULT_COLOR_TRANSFORM_SIZE_BITS, 7u8] {
9432            for cache in [None, Some(4u32), Some(9u32)] {
9433                for &pred_strategy in &STACKED_PREDICTOR_STRATEGIES {
9434                    let stream = encode_with_color_transform_subtract_green_predictor(
9435                        &pixels,
9436                        w,
9437                        h,
9438                        size_bits,
9439                        cache,
9440                        pred_strategy,
9441                    );
9442                    let header = build_image_header(w, h, false);
9443                    let mut payload = header.to_vec();
9444                    payload.extend_from_slice(&stream);
9445                    let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9446                        .expect("decode color+subtract-green+predictor round trip");
9447                    assert_eq!(
9448                        decoded.pixels(),
9449                        pixels.as_slice(),
9450                        "round-trip mismatch size_bits={size_bits} cache={cache:?} strategy={pred_strategy:?}"
9451                    );
9452                }
9453            }
9454        }
9455    }
9456
9457    /// Round 304: the smallest admissible input (exactly `block × block`)
9458    /// still round-trips through the three-transform stack.
9459    #[test]
9460    fn round_304_color_subtract_green_predictor_single_block_round_trips() {
9461        let block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9462        let (w, h) = (block, block);
9463        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9464        for y in 0..h {
9465            for x in 0..w {
9466                let g = (x * 5 + y * 3) % 256;
9467                let r = (g + 27) % 256;
9468                let b = (g + 180) % 256;
9469                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9470            }
9471        }
9472        let stream = encode_with_color_transform_subtract_green_predictor(
9473            &pixels,
9474            w,
9475            h,
9476            DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9477            None,
9478            PredictorSubImageStrategy::L1,
9479        );
9480        let header = build_image_header(w, h, false);
9481        let mut payload = header.to_vec();
9482        payload.extend_from_slice(&stream);
9483        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9484            .expect("decode single-block 3-stack");
9485        assert_eq!(decoded.pixels(), pixels.as_slice());
9486    }
9487
9488    /// Round 304: the full super-chooser stays non-regressing with the new
9489    /// three-transform color → subtract-green → predictor candidate in the
9490    /// mix — the chosen stream is never larger than the best of the
9491    /// pre-304 candidates (the round-303 color + predictor 2-stack plus the
9492    /// single color / predictor / subtract-green paths) and still decodes
9493    /// back to the source pixels.
9494    #[test]
9495    fn round_304_chooser_never_regresses_and_round_trips() {
9496        let (w, h) = (160u32, 120u32);
9497        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9498        let mut state: u32 = 0x1357_9bdf;
9499        for y in 0..h {
9500            for x in 0..w {
9501                state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9502                let n = (state >> 28) as i32 - 8;
9503                let g = ((x as i32) - (y as i32)).rem_euclid(256);
9504                let r = (g + 44 + n).clamp(0, 255) as u32;
9505                let b = (g - 33 + n).clamp(0, 255) as u32;
9506                pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9507            }
9508        }
9509
9510        // Pre-304 best across the single block transforms plus the
9511        // round-303 color + predictor 2-stack.
9512        let single_pred = select_best_cache_bits(|cache_bits| {
9513            encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
9514        });
9515        let single_color = select_best_cache_bits(|cache_bits| {
9516            encode_with_color_transform(
9517                &pixels,
9518                w,
9519                h,
9520                DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9521                cache_bits,
9522                w,
9523            )
9524        });
9525        let color_pred = select_best_cache_bits(|cache_bits| {
9526            encode_with_color_transform_predictor(
9527                &pixels,
9528                w,
9529                h,
9530                DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
9531                cache_bits,
9532                PredictorSubImageStrategy::L1,
9533            )
9534        });
9535        let pre304 = single_pred
9536            .len()
9537            .min(single_color.len())
9538            .min(color_pred.len());
9539
9540        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9541        assert!(
9542            chosen.len() <= pre304,
9543            "chooser regressed: chosen {} > pre-304 best {}",
9544            chosen.len(),
9545            pre304
9546        );
9547
9548        let header = build_image_header(w, h, false);
9549        let mut payload = header.to_vec();
9550        payload.extend_from_slice(&chosen);
9551        let decoded =
9552            crate::vp8l_transform::decode_lossless(&payload, w, h).expect("decode chosen stream");
9553        assert_eq!(decoded.pixels(), pixels.as_slice());
9554    }
9555
9556    /// Round 305: every predictor-sub-image strategy threaded through
9557    /// the stacked §3.5 chains must round-trip bit-exactly. Each
9558    /// strategy only changes which §4.1 mode is recorded per block in
9559    /// the sub-image; the forward transform recomputes residuals against
9560    /// the chosen modes and the decoder reads them back, so the
9561    /// reconstruction is strategy-independent. Exercise all three
9562    /// stacked chains (color + predictor, color + subtract-green +
9563    /// predictor, color-indexing + predictor) across the full strategy
9564    /// set on content where each chain is admissible.
9565    #[test]
9566    fn round_305_stacked_predictor_strategies_round_trip() {
9567        // Photo-like content for the two color-transform chains.
9568        let (w, h) = (128u32, 96u32);
9569        let mut photo: Vec<u32> = Vec::with_capacity((w * h) as usize);
9570        let mut state: u32 = 0x9e37_79b9;
9571        for y in 0..h {
9572            for x in 0..w {
9573                state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9574                let n = (state >> 27) as i32 - 16;
9575                let g = ((x + y) % 256) as i32;
9576                let r = (g + 28 + n).clamp(0, 255) as u32;
9577                let b = (g - 19 - n).clamp(0, 255) as u32;
9578                photo.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9579            }
9580        }
9581        for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9582            let sb = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9583            let s1 = encode_with_color_transform_predictor(&photo, w, h, sb, Some(8), strategy);
9584            let s2 = encode_with_color_transform_subtract_green_predictor(
9585                &photo,
9586                w,
9587                h,
9588                sb,
9589                Some(8),
9590                strategy,
9591            );
9592            for stream in [&s1, &s2] {
9593                let header = build_image_header(w, h, false);
9594                let mut payload = header.to_vec();
9595                payload.extend_from_slice(stream);
9596                let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9597                    .unwrap_or_else(|_| panic!("decode failed for strategy {strategy:?}"));
9598                assert_eq!(
9599                    decoded.pixels(),
9600                    photo.as_slice(),
9601                    "color-chain round-trip mismatch strategy={strategy:?}"
9602                );
9603            }
9604        }
9605
9606        // Round 306: the stacked-chain sub-image-aware lambda set must
9607        // match the single-transform predictor path's lambda sweep
9608        // (encode_argb_with_predictor_chooser, [4_000, 16_000, 64_000,
9609        // 256_000]) so the two paths land on the same residual-vs-
9610        // sub-image cost crossover. If the single-transform sweep ever
9611        // changes, this asserts the stacked sweep is updated in lockstep.
9612        let stacked_lambdas: Vec<u64> = STACKED_PREDICTOR_STRATEGIES
9613            .iter()
9614            .filter_map(|s| match s {
9615                PredictorSubImageStrategy::EntropySubaware { lambda_milli } => Some(*lambda_milli),
9616                _ => None,
9617            })
9618            .collect();
9619        assert_eq!(
9620            stacked_lambdas,
9621            vec![4_000u64, 16_000, 64_000, 256_000],
9622            "stacked sub-image-aware lambda sweep must mirror the single-transform path"
9623        );
9624
9625        // Palette content for the color-indexing chain.
9626        let palette = [0xff00_0000u32, 0xff20_4060, 0xff60_c0ff, 0xffff_ffff];
9627        let (pw, ph) = (96u32, 48u32);
9628        let mut pal: Vec<u32> = Vec::with_capacity((pw * ph) as usize);
9629        for y in 0..ph {
9630            for x in 0..pw {
9631                let idx = ((x / 3 + y) as usize) % palette.len();
9632                pal.push(palette[idx]);
9633            }
9634        }
9635        for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9636            let stream = encode_with_color_indexing_predictor(
9637                &pal,
9638                pw,
9639                ph,
9640                DEFAULT_PREDICTOR_SIZE_BITS,
9641                Some(4),
9642                strategy,
9643            )
9644            .expect("palette feasible, packed image admits a predictor block");
9645            let header = build_image_header(pw, ph, false);
9646            let mut payload = header.to_vec();
9647            payload.extend_from_slice(&stream);
9648            let decoded = crate::vp8l_transform::decode_lossless(&payload, pw, ph)
9649                .unwrap_or_else(|_| panic!("decode failed for strategy {strategy:?}"));
9650            assert_eq!(
9651                decoded.pixels(),
9652                pal.as_slice(),
9653                "color-indexing-chain round-trip mismatch strategy={strategy:?}"
9654            );
9655        }
9656    }
9657
9658    /// Round 305: the entropy-aware strategies must *actually win* on
9659    /// the stacked color-transform + predictor chain for at least one
9660    /// real input — guarding the feature from becoming dead code. On
9661    /// smooth, mildly-noisy photo-like content the color transform
9662    /// decorrelates the channels, leaving a residual the §4.1 predictor
9663    /// sub-image models far better under a true Huffman bit-cost than
9664    /// under the L1 magnitude proxy: the per-block mode histogram
9665    /// concentrates, shrinking both the §7.2 sub-image and the residual
9666    /// stream. The non-L1 best here is materially smaller than L1.
9667    #[test]
9668    fn round_305_entropy_strategy_beats_l1_on_photo_chain() {
9669        let (w, h) = (96u32, 64u32);
9670        let mut px: Vec<u32> = Vec::with_capacity((w * h) as usize);
9671        let mut state: u32 = 0x1000_0000;
9672        for y in 0..h {
9673            for x in 0..w {
9674                state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
9675                let n = (state >> 29) as i32 - 2;
9676                let g = ((x + y) % 256) as i32;
9677                let r = (g + 20 + n).clamp(0, 255) as u32;
9678                let b = (g - 15 - n).clamp(0, 255) as u32;
9679                px.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9680            }
9681        }
9682        let sb = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9683        let l1 = encode_with_color_transform_predictor(
9684            &px,
9685            w,
9686            h,
9687            sb,
9688            Some(8),
9689            PredictorSubImageStrategy::L1,
9690        )
9691        .len();
9692        let entropy = encode_with_color_transform_predictor(
9693            &px,
9694            w,
9695            h,
9696            sb,
9697            Some(8),
9698            PredictorSubImageStrategy::Entropy,
9699        )
9700        .len();
9701        let subaware = encode_with_color_transform_predictor(
9702            &px,
9703            w,
9704            h,
9705            sb,
9706            Some(8),
9707            PredictorSubImageStrategy::EntropySubaware {
9708                lambda_milli: 16_000,
9709            },
9710        )
9711        .len();
9712        let best_non_l1 = entropy.min(subaware);
9713        assert!(
9714            best_non_l1 < l1,
9715            "expected an entropy-aware strategy to beat L1: L1={l1} entropy={entropy} subaware={subaware}"
9716        );
9717    }
9718
9719    /// Round 305: the strategy sweep is non-regressing — the
9720    /// super-chooser's chosen stream is never larger than the round-304
9721    /// baseline (which built the stacked chains with only the L1
9722    /// predictor strategy). Since the L1 strategy remains in
9723    /// [`STACKED_PREDICTOR_STRATEGIES`], adding the entropy variants can
9724    /// only ever keep a smaller stream, never a larger one.
9725    #[test]
9726    fn round_305_strategy_sweep_never_regresses() {
9727        let (w, h) = (160u32, 120u32);
9728        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9729        let mut state: u32 = 0x2545_f491;
9730        for y in 0..h {
9731            for x in 0..w {
9732                state = state.wrapping_mul(1_103_515_245).wrapping_add(12_345);
9733                let n = (state >> 29) as i32 - 2;
9734                let g = ((x as i32) - (y as i32)).rem_euclid(256);
9735                let r = (g + 36 + n).clamp(0, 255) as u32;
9736                let b = (g - 25 + n).clamp(0, 255) as u32;
9737                pixels.push(0xff00_0000 | (r << 16) | ((g as u32) << 8) | b);
9738            }
9739        }
9740
9741        // Baseline: the two color-transform stacked chains built with
9742        // only the L1 strategy (the round-304 behaviour), best across the
9743        // cache sweep and the per-region / single-block size_bits.
9744        let mut sb_sweep = vec![DEFAULT_COLOR_TRANSFORM_SIZE_BITS];
9745        let mut single = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
9746        while single < 9 && ((1u32 << single) < w || (1u32 << single) < h) {
9747            single += 1;
9748        }
9749        if single != DEFAULT_COLOR_TRANSFORM_SIZE_BITS {
9750            sb_sweep.push(single);
9751        }
9752        let mut l1_best = usize::MAX;
9753        for &sb in &sb_sweep {
9754            let a = select_best_cache_bits(|cb| {
9755                encode_with_color_transform_predictor(
9756                    &pixels,
9757                    w,
9758                    h,
9759                    sb,
9760                    cb,
9761                    PredictorSubImageStrategy::L1,
9762                )
9763            });
9764            let b = select_best_cache_bits(|cb| {
9765                encode_with_color_transform_subtract_green_predictor(
9766                    &pixels,
9767                    w,
9768                    h,
9769                    sb,
9770                    cb,
9771                    PredictorSubImageStrategy::L1,
9772                )
9773            });
9774            l1_best = l1_best.min(a.len()).min(b.len());
9775        }
9776
9777        // With the full strategy sweep, the best stacked candidate is
9778        // never larger than the L1-only baseline.
9779        let mut swept_best = usize::MAX;
9780        for &sb in &sb_sweep {
9781            for &strategy in &STACKED_PREDICTOR_STRATEGIES {
9782                let a = select_best_cache_bits(|cb| {
9783                    encode_with_color_transform_predictor(&pixels, w, h, sb, cb, strategy)
9784                });
9785                let b = select_best_cache_bits(|cb| {
9786                    encode_with_color_transform_subtract_green_predictor(
9787                        &pixels, w, h, sb, cb, strategy,
9788                    )
9789                });
9790                swept_best = swept_best.min(a.len()).min(b.len());
9791            }
9792        }
9793        assert!(
9794            swept_best <= l1_best,
9795            "strategy sweep regressed: swept {swept_best} > L1-only {l1_best}"
9796        );
9797    }
9798
9799    /// Probe across palette-shaped synthetic payloads to find at
9800    /// least one for which the round-150 super-chooser picks the
9801    /// §4.4 color-indexing path and the chosen stream is materially
9802    /// smaller than the round-149 baseline (no-tx / subtract-green /
9803    /// predictor / color-transform).
9804    ///
9805    /// The §4.4 path doesn't dominate every palette image — the
9806    /// §5.2.3 color cache + LZ77 already crunch a binary scan-line
9807    /// random image to ~1 bit/pixel, which §4.4 bundling cannot beat
9808    /// without spatial coherence to amortise the palette-table
9809    /// header. The strong §4.4 case is a *binary* image whose packed
9810    /// rows are exact LZ77 copies of preceding rows: at width_bits=3
9811    /// (8 pixels per byte), an N-pixel-wide row collapses to N/8
9812    /// bytes; row-to-row LZ77 matches in the bundled stream cover
9813    /// the row's full N/8 bytes in one Copy token, vs N/3-ish
9814    /// literal pixel tokens without bundling.
9815    #[test]
9816    fn round_150_color_indexing_beats_other_candidates_on_palette_image() {
9817        // 64x32 binary image with row repetition: each row's binary
9818        // pattern is the previous row XOR a fixed-period mask. The
9819        // §4.4 bundled stream (width_bits=3 → 8 bytes wide) has 8
9820        // packed bytes per row of distinct patterns the matcher
9821        // chains; pixel-level LZ77 has 64 literal tokens per row to
9822        // chain. The bundled path's Huffman code over the 8 packed
9823        // bytes is tighter and the row-to-row Copy tokens have a
9824        // smaller distance (8 vs 64), so the entropy stage shrinks
9825        // them further.
9826        let palette: [u32; 2] = [0xff00_0000, 0xffff_ffff];
9827        let w = 64u32;
9828        let h = 32u32;
9829        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9830        let mut row_pattern: u64 = 0xa5a5_a5a5_a5a5_a5a5;
9831        for _y in 0..h {
9832            for x in 0..w {
9833                let bit = (row_pattern >> (x % 64)) & 1;
9834                pixels.push(palette[bit as usize]);
9835            }
9836            // Rotate the row pattern by one bit each row so rows are
9837            // similar (LZ77 finds long matches in the bundled
9838            // stream) but not identical.
9839            row_pattern = row_pattern.rotate_left(1);
9840        }
9841        // The chosen stream is what the chooser actually emits.
9842        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
9843        // Force the no-color-indexing baseline by sampling the chooser's
9844        // pre-CI candidates. The §4.4 candidate must beat the baseline
9845        // measurably (palette-coded images get 2..8× index bundling on
9846        // top of the subtraction-coded palette).
9847        let no_tx_baseline =
9848            select_best_cache_bits(|bits| encode_literals_with_options(&pixels, false, bits, w));
9849        let sg_baseline =
9850            select_best_cache_bits(|bits| encode_literals_with_options(&pixels, true, bits, w));
9851        let pred_baseline = select_best_cache_bits(|bits| {
9852            encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, bits, w)
9853        });
9854        let ctx_baseline = select_best_cache_bits(|bits| {
9855            encode_with_color_transform(&pixels, w, h, DEFAULT_COLOR_TRANSFORM_SIZE_BITS, bits, w)
9856        });
9857        let baseline = no_tx_baseline
9858            .len()
9859            .min(sg_baseline.len())
9860            .min(pred_baseline.len())
9861            .min(ctx_baseline.len());
9862        let ci_only = select_best_cache_bits(|bits| {
9863            encode_with_color_indexing(&pixels, w, h, bits).expect("palette fits")
9864        });
9865        eprintln!(
9866            "[round-150] 64x32 binary row-rotation: chosen={} B, baseline (no §4.4)={} B, ci_only={} B ({:.1}% reduction vs baseline)",
9867            chosen.len(),
9868            baseline,
9869            ci_only.len(),
9870            (1.0 - chosen.len() as f64 / baseline as f64) * 100.0
9871        );
9872        assert!(
9873            chosen.len() < baseline,
9874            "round-150 color-indexing must beat the round-149 baseline on a palette image: \
9875             chosen={} B vs baseline={} B (ci_only={} B)",
9876            chosen.len(),
9877            baseline,
9878            ci_only.len(),
9879        );
9880
9881        // And the chosen stream must still round-trip through the
9882        // top-level decoder when wrapped in a complete RIFF/WEBP file.
9883        let rgba: Vec<u8> = pixels
9884            .iter()
9885            .flat_map(|&p| {
9886                let a = ((p >> 24) & 0xff) as u8;
9887                let r = ((p >> 16) & 0xff) as u8;
9888                let g = ((p >> 8) & 0xff) as u8;
9889                let b = (p & 0xff) as u8;
9890                [r, g, b, a]
9891            })
9892            .collect();
9893        let webp_bytes = encode_webp_lossless(&rgba, w, h).expect("encode round-150 webp");
9894        let decoded = crate::decode_webp(&webp_bytes).expect("decode round-150 webp");
9895        assert_eq!(decoded.frames.len(), 1);
9896        assert_eq!(decoded.frames[0].rgba.as_slice(), rgba.as_slice());
9897    }
9898
9899    /// On photo-like noise (>256 unique colors), the §4.4 candidate
9900    /// is unreachable (the O(N) palette probe returns `None`) and the
9901    /// chooser silently keeps the best of the round-149 candidates.
9902    /// This guarantees the round-150 path never regresses on
9903    /// non-palette content.
9904    #[test]
9905    fn color_indexing_chooser_skips_photo_like_content() {
9906        let w = 64u32;
9907        let h = 64u32;
9908        let mut pixels: Vec<u32> = Vec::with_capacity((w * h) as usize);
9909        // 64x64 = 4096 unique values, well above the §4.4 256-entry
9910        // threshold.
9911        let mut state: u32 = 0xFEED_FACE;
9912        for _ in 0..(w * h) {
9913            state = state.wrapping_mul(1_103_515_245).wrapping_add(12345);
9914            pixels.push(0xff00_0000 | (state & 0x00ff_ffff));
9915        }
9916        assert!(collect_palette(&pixels).is_none());
9917        // The chooser must still return a valid stream that decodes
9918        // exactly — the §4.4 path is just silently skipped.
9919        let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
9920        let header = build_image_header(w, h, false);
9921        let mut payload = header.to_vec();
9922        payload.extend_from_slice(&stream);
9923        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
9924            .expect("decode photo-like content");
9925        assert_eq!(decoded.pixels(), pixels.as_slice());
9926    }
9927
9928    // ---- Round 151: §6.2.2 multi-meta-prefix (entropy image) ----
9929
9930    /// Build a synthetic two-region image: the top half draws from a
9931    /// smooth low-green gradient, the bottom half from a smooth
9932    /// high-green gradient. The per-region green statistics diverge
9933    /// sharply, so the encoder's mean-green clusterer should split the
9934    /// image cleanly along the horizontal midpoint and the per-region
9935    /// Huffman codes get tighter than a single shared code over both
9936    /// regions' bimodal histogram.
9937    fn two_region_bimodal_image(width: u32, height: u32) -> Vec<u32> {
9938        let w = width as usize;
9939        let h = height as usize;
9940        let mut pixels = Vec::with_capacity(w * h);
9941        for y in 0..h {
9942            for x in 0..w {
9943                let (r, g, b) = if y < h / 2 {
9944                    // Top: low green, varying red.
9945                    let g = 32u32.wrapping_add(((x as u32) & 0x1f) * 2);
9946                    let r = 64u32.wrapping_add((y as u32) & 0x0f);
9947                    (r, g, 16u32)
9948                } else {
9949                    // Bottom: high green, varying blue.
9950                    let g = 200u32.wrapping_add((x as u32) & 0x1f);
9951                    let b = 96u32.wrapping_add((y as u32) & 0x0f);
9952                    (16u32, g, b)
9953                };
9954                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
9955            }
9956        }
9957        pixels
9958    }
9959
9960    /// Build a noisy two-region image whose unique-color count blows
9961    /// the §4.4 palette path (forcing the chooser onto the LZ77 /
9962    /// predictor / color-transform candidates). The top half draws
9963    /// red/green/blue from one PRNG state, the bottom half from a
9964    /// disjoint PRNG state biased to different per-channel means; the
9965    /// per-region histograms diverge enough that per-region Huffman
9966    /// codes beat a single shared code.
9967    fn two_region_noisy_image(width: u32, height: u32) -> Vec<u32> {
9968        let w = width as usize;
9969        let h = height as usize;
9970        let mut pixels = Vec::with_capacity(w * h);
9971        let mut s_top: u32 = 0xC0FF_EE00;
9972        let mut s_bot: u32 = 0xBADC_AFE5;
9973        for y in 0..h {
9974            for x in 0..w {
9975                let argb = if y < h / 2 {
9976                    s_top = s_top.wrapping_mul(1_103_515_245).wrapping_add(12345);
9977                    let r = s_top & 0x3f; // 0..63
9978                    let g = ((s_top >> 8) & 0x3f).wrapping_add(192); // 192..255
9979                    let b = (s_top >> 16) & 0x1f; // 0..31
9980                    (0xffu32 << 24) | (r << 16) | (g << 8) | b
9981                } else {
9982                    s_bot = s_bot.wrapping_mul(1_103_515_245).wrapping_add(12345);
9983                    let r = ((s_bot >> 8) & 0x3f).wrapping_add(192); // 192..255
9984                    let g = s_bot & 0x3f; // 0..63
9985                    let b = ((s_bot >> 16) & 0x1f).wrapping_add(192); // 192..223
9986                    (0xffu32 << 24) | (r << 16) | (g << 8) | b
9987                };
9988                // `x` is intentionally unused: we want per-pixel hashes
9989                // to diverge from the PRNG state alone so per-region
9990                // histograms remain stable across columns.
9991                let _ = x;
9992                pixels.push(argb);
9993            }
9994        }
9995        pixels
9996    }
9997
9998    /// The histogram-distance clusterer must produce a non-degenerate
9999    /// (≥ 2-group) split on the headline two-region bimodal fixture
10000    /// (top and bottom halves use disjoint per-channel ranges), and
10001    /// the resulting meta-codes must reflect the top-vs-bottom split.
10002    #[test]
10003    fn meta_prefix_clusterer_splits_two_region_bimodal_fixture() {
10004        let w = 64u32;
10005        let h = 64u32;
10006        let pixels = two_region_bimodal_image(w, h);
10007        // prefix_bits = 4 → 16-pixel blocks → 4x4 entropy image; the
10008        // horizontal midpoint sits on the block-row-2/3 boundary, so
10009        // clustering should put rows 0..2 in one group and rows 2..4 in
10010        // the other.
10011        let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 2);
10012        assert_eq!(codes.len(), 16);
10013        // Top two block-rows should agree; bottom two should agree;
10014        // the two halves must differ from each other.
10015        let top = codes[0];
10016        let bot = codes[12];
10017        assert_ne!(
10018            top, bot,
10019            "top half group must differ from bottom half group"
10020        );
10021        for c in &codes[0..8] {
10022            assert_eq!(*c, top, "top-half blocks must share a group");
10023        }
10024        for c in &codes[8..16] {
10025            assert_eq!(*c, bot, "bottom-half blocks must share a group");
10026        }
10027    }
10028
10029    /// The histogram-distance clusterer must separate two regions
10030    /// whose per-block *mean green* coincides but whose per-block
10031    /// green *distribution* diverges — the failure mode of the
10032    /// round-151 mean-statistic bucketiser. Top half: bimodal green
10033    /// alternating 16/240 (mean ≈ 128). Bottom half: flat green at
10034    /// 128 (also mean ≈ 128).
10035    #[test]
10036    fn histogram_clusterer_separates_blocks_sharing_a_mean() {
10037        let w = 32u32;
10038        let h = 32u32;
10039        let w_us = w as usize;
10040        let h_us = h as usize;
10041        let mut pixels: Vec<u32> = Vec::with_capacity(w_us * h_us);
10042        for y in 0..h_us {
10043            for x in 0..w_us {
10044                let g = if y < h_us / 2 {
10045                    if (x ^ y) & 1 == 0 {
10046                        16u32
10047                    } else {
10048                        240u32
10049                    }
10050                } else {
10051                    128u32
10052                };
10053                pixels.push(0xff00_0000 | (g << 8));
10054            }
10055        }
10056        // prefix_bits = 4 → 16-pixel blocks → 2x2 entropy image. The
10057        // top row of two blocks should differ from the bottom row.
10058        let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 2);
10059        assert_eq!(codes.len(), 4);
10060        let top_left = codes[0];
10061        let bot_left = codes[2];
10062        assert_ne!(
10063            top_left, bot_left,
10064            "bimodal-vs-flat green regions must split into distinct groups",
10065        );
10066    }
10067
10068    /// Clustering must be a pure function of its inputs: two calls
10069    /// with the same arguments produce the same `Vec<u16>`. Encoder
10070    /// reproducibility depends on this.
10071    #[test]
10072    fn histogram_clusterer_is_deterministic() {
10073        let w = 64u32;
10074        let h = 64u32;
10075        let pixels = two_region_noisy_image(w, h);
10076        let first = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 3);
10077        let second = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 3);
10078        assert_eq!(first, second);
10079    }
10080
10081    /// A uniform image (every pixel the same value) has no per-block
10082    /// histogram divergence, so the clusterer must collapse to a
10083    /// single group. The encoder relies on this `actual_groups < 2`
10084    /// signal to skip the multi-group path cleanly.
10085    #[test]
10086    fn histogram_clusterer_collapses_on_uniform_image() {
10087        let w = 64u32;
10088        let h = 64u32;
10089        let pixels = vec![0xff80_8080u32; (w * h) as usize];
10090        let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 4);
10091        assert_eq!(codes.len(), 16);
10092        for c in &codes {
10093            assert_eq!(*c, 0, "uniform image must collapse to one group");
10094        }
10095    }
10096
10097    /// `num_groups = 1` must short-circuit straight to an all-zeros
10098    /// map (the caller asked for one group; running Lloyd's iteration
10099    /// would only waste cycles confirming the trivial answer).
10100    #[test]
10101    fn histogram_clusterer_num_groups_one_returns_all_zeros() {
10102        let w = 32u32;
10103        let h = 32u32;
10104        let pixels = two_region_noisy_image(w, h);
10105        let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 1);
10106        assert!(codes.iter().all(|&c| c == 0));
10107    }
10108
10109    /// The returned meta-codes must form the *compact* contiguous
10110    /// range `0..max + 1` with no gaps. Per RFC 9649 §3.7.2.2.2,
10111    /// `num_prefix_groups = max(entropy image) + 1`, so an unused
10112    /// group sitting between used ones would inflate the encoder's
10113    /// per-group prefix-code-table cost without ever being read.
10114    #[test]
10115    fn histogram_clusterer_returns_compact_group_ids() {
10116        let w = 64u32;
10117        let h = 64u32;
10118        let pixels = two_region_noisy_image(w, h);
10119        let codes = cluster_blocks_by_histogram_distance(&pixels, w, h, 4, 4);
10120        let max_code = codes.iter().copied().max().unwrap_or(0) as usize;
10121        let mut seen = vec![false; max_code + 1];
10122        for &c in &codes {
10123            seen[c as usize] = true;
10124        }
10125        for (i, &s) in seen.iter().enumerate() {
10126            assert!(s, "gap at group id {i} — compaction failed");
10127        }
10128    }
10129
10130    /// `encode_with_meta_prefix` produces a stream the decoder reads
10131    /// back to the exact input pixels — the end-to-end round trip on
10132    /// a non-trivial multi-group image.
10133    #[test]
10134    fn meta_prefix_two_group_round_trips_through_decoder() {
10135        let w = 64u32;
10136        let h = 64u32;
10137        let pixels = two_region_bimodal_image(w, h);
10138        let stream = encode_with_meta_prefix(&pixels, w, h, 4, 2, None, w)
10139            .expect("two-region image admits a 2-group split");
10140        let header = build_image_header(w, h, false);
10141        let mut payload = header.to_vec();
10142        payload.extend_from_slice(&stream);
10143        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10144            .expect("decode meta-prefix stream");
10145        assert_eq!(decoded.pixels(), pixels.as_slice());
10146    }
10147
10148    /// Same round-trip as above but with the §5.2.3 color cache
10149    /// enabled at the median cache size (`code_bits = 8` → 256-entry
10150    /// cache). Verifies the cache + multi-group composition.
10151    #[test]
10152    fn meta_prefix_two_group_with_cache_round_trips_through_decoder() {
10153        let w = 32u32;
10154        let h = 32u32;
10155        let pixels = two_region_bimodal_image(w, h);
10156        let stream = encode_with_meta_prefix(&pixels, w, h, 4, 2, Some(8), w)
10157            .expect("two-region image admits a 2-group split with cache");
10158        let header = build_image_header(w, h, false);
10159        let mut payload = header.to_vec();
10160        payload.extend_from_slice(&stream);
10161        let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10162            .expect("decode meta-prefix-with-cache stream");
10163        assert_eq!(decoded.pixels(), pixels.as_slice());
10164    }
10165
10166    /// Cross-check round-trip with 3 and 4 groups on a noisy
10167    /// multi-region image. Verifies the encoder's per-group code
10168    /// emission is correct for `num_prefix_groups > 2`.
10169    #[test]
10170    fn meta_prefix_three_and_four_groups_round_trip_through_decoder() {
10171        let w = 64u32;
10172        let h = 64u32;
10173        let pixels = two_region_noisy_image(w, h);
10174        for num_groups in [3u32, 4u32] {
10175            let stream = encode_with_meta_prefix(&pixels, w, h, 4, num_groups, None, w)
10176                .unwrap_or_else(|| panic!("noisy image admits {num_groups} groups"));
10177            let header = build_image_header(w, h, false);
10178            let mut payload = header.to_vec();
10179            payload.extend_from_slice(&stream);
10180            let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10181                .unwrap_or_else(|e| panic!("decode {num_groups}-group stream: {e}"));
10182            assert_eq!(
10183                decoded.pixels(),
10184                pixels.as_slice(),
10185                "round-trip failed for num_groups={num_groups}"
10186            );
10187        }
10188    }
10189
10190    /// Cross-check round-trip across every `prefix_bits` value the
10191    /// chooser sweeps. Verifies the per-block size dispatch (and
10192    /// therefore the on-wire `prefix_bits - 2` field) for the full
10193    /// `META_PREFIX_BITS_SWEEP`. Image is 256x256 so the largest
10194    /// sweep value (`prefix_bits = 7` → 128-pixel blocks) still
10195    /// admits a 2×2 entropy image; smaller values produce
10196    /// proportionally larger entropy images.
10197    #[test]
10198    fn meta_prefix_all_sweep_prefix_bits_round_trip_through_decoder() {
10199        let w = 256u32;
10200        let h = 256u32;
10201        let pixels = two_region_noisy_image(w, h);
10202        for &pb in META_PREFIX_BITS_SWEEP.iter() {
10203            let stream =
10204                encode_with_meta_prefix(&pixels, w, h, pb, 2, None, w).unwrap_or_else(|| {
10205                    panic!("256x256 noisy image admits 2-group at prefix_bits={pb}")
10206                });
10207            let header = build_image_header(w, h, false);
10208            let mut payload = header.to_vec();
10209            payload.extend_from_slice(&stream);
10210            let decoded = crate::vp8l_transform::decode_lossless(&payload, w, h)
10211                .unwrap_or_else(|e| panic!("decode prefix_bits={pb} stream: {e}"));
10212            assert_eq!(
10213                decoded.pixels(),
10214                pixels.as_slice(),
10215                "round-trip failed for prefix_bits={pb}"
10216            );
10217        }
10218    }
10219
10220    /// Degenerate cases (image too small for any multi-block split,
10221    /// uniform image whose clustering collapses to one group) must
10222    /// surface as `None` so the chooser can skip the candidate
10223    /// cleanly.
10224    #[test]
10225    fn meta_prefix_returns_none_when_too_small_for_a_split() {
10226        // 1x1 image — no `prefix_bits ∈ [4..7]` admits two blocks.
10227        let pixels = vec![0xff10_2030u32];
10228        for &pb in META_PREFIX_BITS_SWEEP.iter() {
10229            for num_groups in 2..=MAX_META_GROUPS {
10230                assert!(
10231                    encode_with_meta_prefix(&pixels, 1, 1, pb, num_groups, None, 1).is_none(),
10232                    "1x1 image must not produce a multi-group stream (prefix_bits={pb}, num_groups={num_groups})"
10233                );
10234            }
10235        }
10236    }
10237
10238    #[test]
10239    fn meta_prefix_returns_none_on_uniform_image() {
10240        let w = 64u32;
10241        let h = 64u32;
10242        let pixels = vec![0xff80_8080u32; (w * h) as usize];
10243        // All blocks have identical mean green → clustering collapses.
10244        assert!(encode_with_meta_prefix(&pixels, w, h, 4, 2, None, w).is_none());
10245    }
10246
10247    /// The full chooser must still produce a decodable stream when the
10248    /// multi-meta-prefix candidate sometimes wins. End-to-end via the
10249    /// top-level `decode_webp`.
10250    #[test]
10251    fn round_151_chooser_round_trips_on_two_region_image() {
10252        let w = 64u32;
10253        let h = 64u32;
10254        let pixels = two_region_bimodal_image(w, h);
10255        let rgba: Vec<u8> = pixels
10256            .iter()
10257            .flat_map(|&p| {
10258                let a = ((p >> 24) & 0xff) as u8;
10259                let r = ((p >> 16) & 0xff) as u8;
10260                let g = ((p >> 8) & 0xff) as u8;
10261                let b = (p & 0xff) as u8;
10262                [r, g, b, a]
10263            })
10264            .collect();
10265        let webp_bytes = encode_webp_lossless(&rgba, w, h).expect("encode round-151 webp");
10266        let decoded = crate::decode_webp(&webp_bytes).expect("decode round-151 webp");
10267        assert_eq!(decoded.frames.len(), 1);
10268        assert_eq!(decoded.frames[0].rgba.as_slice(), rgba.as_slice());
10269    }
10270
10271    /// Diagnostic-only sweep: prints baseline vs multi-meta-prefix
10272    /// candidate sizes across a handful of image shapes / sizes. Used
10273    /// to inform the chooser's `META_PREFIX_BITS_SWEEP` choice and to
10274    /// quantify whether the candidate ever shrinks the chosen stream
10275    /// on the round-150 super-chooser's hardest cases. Test is
10276    /// observational — no assertion beyond the round-trip — so a
10277    /// future round can re-tune the sweep without changing the
10278    /// invariant set.
10279    #[test]
10280    fn round_151_diagnostic_sweep_records_per_shape_costs() {
10281        let shapes = [
10282            (
10283                "64x64 noisy 2-region",
10284                two_region_noisy_image(64, 64),
10285                64u32,
10286                64u32,
10287            ),
10288            (
10289                "128x128 noisy 2-region",
10290                two_region_noisy_image(128, 128),
10291                128u32,
10292                128u32,
10293            ),
10294            (
10295                "64x128 noisy 2-region",
10296                two_region_noisy_image(64, 128),
10297                64u32,
10298                128u32,
10299            ),
10300            (
10301                "256x256 noisy 2-region",
10302                two_region_noisy_image(256, 256),
10303                256u32,
10304                256u32,
10305            ),
10306        ];
10307        for (name, pixels, w, h) in &shapes {
10308            let baseline = encode_argb_with_predictor_chooser(pixels, *w, *h);
10309            let mp_opt = sweep_meta_prefix_candidate(pixels, *w, *h);
10310            let mp_len = mp_opt.as_ref().map(|v| v.len()).unwrap_or(usize::MAX);
10311            eprintln!(
10312                "[round-151 diag] {name}: baseline={} B, mp_only={} B, mp_wins={}",
10313                baseline.len(),
10314                mp_len,
10315                mp_len < baseline.len()
10316            );
10317        }
10318    }
10319
10320    /// Headline regression: on a large two-region noisy image whose
10321    /// per-region channel histograms diverge sharply (and the §4.4
10322    /// palette path is unreachable because of unique-color count),
10323    /// the round-151 multi-meta-prefix path's per-region Huffman codes
10324    /// shrink the chosen stream below the round-150 super-chooser's
10325    /// best pre-round-151 candidate. Prints the delta so the round
10326    /// report can quote a measured percentage.
10327    #[test]
10328    fn round_151_multi_meta_prefix_beats_single_group_on_noisy_image() {
10329        let w = 128u32;
10330        let h = 128u32;
10331        let pixels = two_region_noisy_image(w, h);
10332
10333        // Round-150 baseline: the chooser without the round-151
10334        // multi-meta-prefix candidate.
10335        let mut baseline = encode_argb_literals_with_width(&pixels, w);
10336        let pred_block = 1u32 << DEFAULT_PREDICTOR_SIZE_BITS;
10337        let ctx_block = 1u32 << DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
10338        if w >= pred_block && h >= pred_block {
10339            let pred = select_best_cache_bits(|cache_bits| {
10340                encode_with_predictor(&pixels, w, h, DEFAULT_PREDICTOR_SIZE_BITS, cache_bits, w)
10341            });
10342            if pred.len() < baseline.len() {
10343                baseline = pred;
10344            }
10345        }
10346        if w >= ctx_block && h >= ctx_block {
10347            let ctx = select_best_cache_bits(|cache_bits| {
10348                encode_with_color_transform(
10349                    &pixels,
10350                    w,
10351                    h,
10352                    DEFAULT_COLOR_TRANSFORM_SIZE_BITS,
10353                    cache_bits,
10354                    w,
10355                )
10356            });
10357            if ctx.len() < baseline.len() {
10358                baseline = ctx;
10359            }
10360        }
10361        if collect_palette(&pixels).is_some() {
10362            let ci = select_best_cache_bits(|cache_bits| {
10363                encode_with_color_indexing(&pixels, w, h, cache_bits).expect("palette fits")
10364            });
10365            if ci.len() < baseline.len() {
10366                baseline = ci;
10367            }
10368        }
10369
10370        // Round-151 multi-meta-prefix candidate (the smallest
10371        // (prefix_bits, num_groups, cache_bits) it admits).
10372        let mp = sweep_meta_prefix_candidate(&pixels, w, h)
10373            .expect("two-region 128x128 image admits a multi-group split");
10374
10375        // And the full chooser including round 151.
10376        let chosen = encode_argb_with_predictor_chooser(&pixels, w, h);
10377        eprintln!(
10378            "[round-151] 128x128 two-region noisy: chosen={} B, baseline (no §6.2.2)={} B, mp_only={} B ({:.1}% reduction vs baseline)",
10379            chosen.len(),
10380            baseline.len(),
10381            mp.len(),
10382            (1.0 - chosen.len() as f64 / baseline.len() as f64) * 100.0
10383        );
10384        assert!(
10385            chosen.len() <= baseline.len(),
10386            "round-151 chooser must never regress on the round-150 baseline: \
10387             chosen={} B vs baseline={} B (mp_only={} B)",
10388            chosen.len(),
10389            baseline.len(),
10390            mp.len(),
10391        );
10392    }
10393
10394    // ---- Round-152 measurement harness -----------------------------
10395    //
10396    // Reproduces the round-151 mean-green clusterer locally so the test
10397    // can measure the multi-meta-prefix candidate's byte cost with both
10398    // partitioners and confirm the histogram path is strictly smaller
10399    // on the diagnostic two-region noisy fixture. The mean-green
10400    // implementation here is a verbatim copy of the round-151 helper
10401    // that lived in this file before this round; it's `#[cfg(test)]`-
10402    // only and never reachable from the encoder.
10403    fn cluster_blocks_by_mean_green_for_bench(
10404        pixels: &[u32],
10405        width: u32,
10406        height: u32,
10407        prefix_bits: u8,
10408        num_groups: u32,
10409    ) -> Vec<u16> {
10410        let block_side = 1u32 << prefix_bits;
10411        let pw = width.div_ceil(block_side);
10412        let ph = height.div_ceil(block_side);
10413        let num_blocks = (pw * ph) as usize;
10414        let mut block_mean: Vec<f64> = vec![0.0; num_blocks];
10415        let mut block_count: Vec<u32> = vec![0; num_blocks];
10416        let row = width as usize;
10417        let pw_u = pw as usize;
10418        for y in 0..height as usize {
10419            let by = y / block_side as usize;
10420            for x in 0..width as usize {
10421                let bx = x / block_side as usize;
10422                let b = by * pw_u + bx;
10423                let g = ((pixels[y * row + x] >> 8) & 0xff) as f64;
10424                block_mean[b] += g;
10425                block_count[b] += 1;
10426            }
10427        }
10428        for b in 0..num_blocks {
10429            if block_count[b] > 0 {
10430                block_mean[b] /= block_count[b] as f64;
10431            }
10432        }
10433        if num_groups == 1 {
10434            return vec![0u16; num_blocks];
10435        }
10436        let mut lo = f64::INFINITY;
10437        let mut hi = f64::NEG_INFINITY;
10438        for &m in &block_mean {
10439            if m < lo {
10440                lo = m;
10441            }
10442            if m > hi {
10443                hi = m;
10444            }
10445        }
10446        if hi <= lo {
10447            return vec![0u16; num_blocks];
10448        }
10449        let span = hi - lo;
10450        let step = span / num_groups as f64;
10451        let mut codes = Vec::with_capacity(num_blocks);
10452        for &m in &block_mean {
10453            let bucket = (((m - lo) / step).floor() as i64).clamp(0, num_groups as i64 - 1);
10454            codes.push(bucket as u16);
10455        }
10456        codes
10457    }
10458
10459    /// Body-shared bencher: encode `pixels` via the multi-meta-prefix
10460    /// candidate using either the mean-green or histogram-distance
10461    /// clusterer, returning the encoded byte count. Drives
10462    /// `encode_with_meta_prefix` directly by overriding the cluster
10463    /// step's output through a tiny shim.
10464    fn measure_mp_bytes_at(
10465        pixels: &[u32],
10466        w: u32,
10467        h: u32,
10468        prefix_bits: u8,
10469        num_groups: u32,
10470        use_histogram: bool,
10471    ) -> Option<usize> {
10472        let block_side = 1u32 << prefix_bits;
10473        let pw = w.div_ceil(block_side);
10474        let ph = h.div_ceil(block_side);
10475        if (pw * ph) < num_groups {
10476            return None;
10477        }
10478        let codes = if use_histogram {
10479            cluster_blocks_by_histogram_distance(pixels, w, h, prefix_bits, num_groups)
10480        } else {
10481            cluster_blocks_by_mean_green_for_bench(pixels, w, h, prefix_bits, num_groups)
10482        };
10483        // Reach into encode_with_meta_prefix's internals by reusing
10484        // its emitter parts: build the EncoderMetaIndex from `codes`
10485        // and run the same writer path. Easier: call the encoder
10486        // directly when `use_histogram` is true (it uses the new
10487        // clusterer); the mean-green branch needs a manual emit.
10488        // Since the two paths share every step except the codes
10489        // vector, the round-trip is much cleaner if we just call
10490        // `encode_with_meta_prefix` for the histogram branch and a
10491        // tiny re-emit for the mean-green branch that mirrors the
10492        // same writer steps.
10493        //
10494        // For a measurement test it's enough to compare the two byte
10495        // counts at the same `(prefix_bits, num_groups)`, which is
10496        // exactly what the chooser ablation needs. We achieve that by
10497        // letting `encode_with_meta_prefix` drive the histogram path
10498        // and replaying the same steps inline for the mean-green
10499        // path.
10500        if use_histogram {
10501            return encode_with_meta_prefix(pixels, w, h, prefix_bits, num_groups, None, w)
10502                .map(|v| v.len());
10503        }
10504        // Mean-green inline emission (same shape as
10505        // encode_with_meta_prefix).
10506        let index = EncoderMetaIndex {
10507            prefix_bits,
10508            block_width: pw,
10509            codes,
10510        };
10511        let actual_groups = index.num_groups();
10512        if actual_groups < 2 {
10513            return None;
10514        }
10515        let tokens = tokenize_lz77(pixels);
10516        let buckets = split_tokens_by_group(&tokens, &index, w, actual_groups);
10517        let group_codes = build_group_codes(&buckets, 0, w);
10518        let mut bw = BitWriter::new();
10519        bw.write_bit(false);
10520        bw.write_bit(false);
10521        bw.write_bit(true);
10522        bw.write_bits((prefix_bits - 2) as u32, 3);
10523        let entropy_image = index.entropy_image_argb();
10524        write_entropy_coded_image_literals(&mut bw, &entropy_image);
10525        for group in &group_codes {
10526            for code in group.iter() {
10527                code.write_code_lengths(&mut bw);
10528            }
10529        }
10530        let mut pos = 0usize;
10531        let w_pixels = w as usize;
10532        for &tok in &tokens {
10533            let x = (pos % w_pixels) as u32;
10534            let y = (pos / w_pixels) as u32;
10535            let g = index.group_for(x, y) as usize;
10536            let codes = &group_codes[g];
10537            let green_code = &codes[0];
10538            let red_code = &codes[1];
10539            let blue_code = &codes[2];
10540            let alpha_code = &codes[3];
10541            let dist_code = &codes[4];
10542            match tok {
10543                Token::Literal(p) => {
10544                    let a = ((p >> 24) & 0xff) as usize;
10545                    let r = ((p >> 16) & 0xff) as usize;
10546                    let g_ch = ((p >> 8) & 0xff) as usize;
10547                    let b = (p & 0xff) as usize;
10548                    green_code.write_symbol(&mut bw, g_ch);
10549                    red_code.write_symbol(&mut bw, r);
10550                    blue_code.write_symbol(&mut bw, b);
10551                    alpha_code.write_symbol(&mut bw, a);
10552                    pos += 1;
10553                }
10554                Token::CacheRef { .. } => unreachable!("no cache in measurement"),
10555                Token::Copy { length, distance } => {
10556                    write_lz77_value(&mut bw, green_code, 256, length as u32);
10557                    let raw_code = pixel_distance_to_distance_code(distance, w);
10558                    write_lz77_value(&mut bw, dist_code, 0, raw_code);
10559                    pos += length;
10560                }
10561            }
10562        }
10563        Some(bw.into_bytes().len())
10564    }
10565
10566    /// A four-region fixture where the top-left quadrant has the same
10567    /// per-channel mean as the bottom-right but a very different
10568    /// per-channel distribution, and the top-right has the same mean
10569    /// as the bottom-left also with a divergent distribution. The
10570    /// mean-green clusterer at `num_groups = 2` can only find one
10571    /// axis of separation and folds two distinct distributions onto
10572    /// the same group; the histogram clusterer separates by full
10573    /// distribution and finds the right partition.
10574    fn four_region_mean_collision_image(width: u32, height: u32) -> Vec<u32> {
10575        let w = width as usize;
10576        let h = height as usize;
10577        let mut pixels = Vec::with_capacity(w * h);
10578        let mut s: u32 = 0x12345678;
10579        for y in 0..h {
10580            for x in 0..w {
10581                s = s.wrapping_mul(1_103_515_245).wrapping_add(12345);
10582                let top = y < h / 2;
10583                let left = x < w / 2;
10584                // Pick (g, r) pairs whose means match across the
10585                // top-left vs bottom-right and top-right vs bottom-left
10586                // diagonals but whose distributions are very different.
10587                let (g, r, b) = match (top, left) {
10588                    (true, true) => {
10589                        // top-left: g bimodal {16, 240} mean ≈ 128
10590                        let gv = if (s & 1) == 0 { 16 } else { 240 };
10591                        let rv = (s >> 8) & 0x3f;
10592                        let bv = (s >> 16) & 0x3f;
10593                        (gv, rv, bv)
10594                    }
10595                    (true, false) => {
10596                        // top-right: g flat 128
10597                        let gv = 128u32;
10598                        let rv = ((s >> 8) & 0x3f).wrapping_add(192);
10599                        let bv = (s >> 16) & 0x3f;
10600                        (gv, rv, bv)
10601                    }
10602                    (false, true) => {
10603                        // bottom-left: g bimodal but {64, 192} mean ≈ 128
10604                        let gv = if (s & 1) == 0 { 64 } else { 192 };
10605                        let rv = (s >> 8) & 0x3f;
10606                        let bv = ((s >> 16) & 0x3f).wrapping_add(192);
10607                        (gv, rv, bv)
10608                    }
10609                    (false, false) => {
10610                        // bottom-right: g flat 128 too
10611                        let gv = 128u32;
10612                        let rv = ((s >> 8) & 0x3f).wrapping_add(192);
10613                        let bv = ((s >> 16) & 0x3f).wrapping_add(192);
10614                        (gv, rv, bv)
10615                    }
10616                };
10617                pixels.push(0xff00_0000 | (r << 16) | (g << 8) | b);
10618            }
10619        }
10620        pixels
10621    }
10622
10623    /// For a given fixture, sweep every `(prefix_bits, num_groups)`
10624    /// the round-151 chooser searches and return the smallest
10625    /// non-degenerate multi-meta-prefix byte cost under the named
10626    /// clusterer. Returns `None` if every combination collapsed.
10627    fn best_mp_bytes_over_sweep(
10628        pixels: &[u32],
10629        w: u32,
10630        h: u32,
10631        use_histogram: bool,
10632    ) -> Option<usize> {
10633        let mut best: Option<usize> = None;
10634        for &prefix_bits in META_PREFIX_BITS_SWEEP.iter() {
10635            for num_groups in 2u32..=MAX_META_GROUPS {
10636                if let Some(bytes) =
10637                    measure_mp_bytes_at(pixels, w, h, prefix_bits, num_groups, use_histogram)
10638                {
10639                    best = Some(match best {
10640                        Some(b) => b.min(bytes),
10641                        None => bytes,
10642                    });
10643                }
10644            }
10645        }
10646        best
10647    }
10648
10649    /// Confirm the round-152 histogram-distance clusterer beats (or at
10650    /// worst ties) the round-151 mean-green bucketiser on the
10651    /// diagnostic two-region noisy sweep. Prints byte counts (run with
10652    /// `--nocapture`).
10653    #[test]
10654    fn histogram_clusterer_reduces_mp_bytes_on_two_region_sweep() {
10655        let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (64, 128), (256, 256)];
10656        for &(w, h) in shapes {
10657            let pixels = two_region_noisy_image(w, h);
10658            let mg = best_mp_bytes_over_sweep(&pixels, w, h, false)
10659                .expect("mean-green path must produce a candidate");
10660            let hi = best_mp_bytes_over_sweep(&pixels, w, h, true)
10661                .expect("histogram path must produce a candidate");
10662            assert!(
10663                hi <= mg,
10664                "{w}x{h}: histogram path produced {hi} B, mean-green produced {mg} B \
10665                 — histogram path must not regress on the two-region sweep",
10666            );
10667            println!(
10668                "r152 measurement {w}x{h}: mean-green={mg} B histogram={hi} B \
10669                 delta={} B ({:.2}%)",
10670                mg as i64 - hi as i64,
10671                100.0 * (mg as f64 - hi as f64) / mg as f64,
10672            );
10673        }
10674    }
10675
10676    /// Confirm the histogram clusterer is *strictly* better than
10677    /// mean-green on the four-region mean-collision fixture, where
10678    /// blocks sharing a green mean diverge in distribution. Prints
10679    /// byte counts (run with `--nocapture`).
10680    #[test]
10681    fn histogram_clusterer_reduces_mp_bytes_on_mean_collision_sweep() {
10682        let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (64, 128), (256, 256)];
10683        for &(w, h) in shapes {
10684            let pixels = four_region_mean_collision_image(w, h);
10685            let mg_opt = best_mp_bytes_over_sweep(&pixels, w, h, false);
10686            let hi = best_mp_bytes_over_sweep(&pixels, w, h, true)
10687                .expect("histogram path must produce a candidate");
10688            match mg_opt {
10689                Some(mg) => {
10690                    assert!(
10691                        hi < mg,
10692                        "{w}x{h}: histogram path produced {hi} B, mean-green produced {mg} B \
10693                         — histogram path must strictly improve on mean-collision fixture",
10694                    );
10695                    println!(
10696                        "r152 mean-collision {w}x{h}: mean-green={mg} B histogram={hi} B \
10697                         delta={} B ({:.2}%)",
10698                        mg as i64 - hi as i64,
10699                        100.0 * (mg as f64 - hi as f64) / mg as f64,
10700                    );
10701                }
10702                None => {
10703                    println!(
10704                        "r152 mean-collision {w}x{h}: mean-green collapsed (no candidate); \
10705                         histogram={hi} B",
10706                    );
10707                }
10708            }
10709        }
10710    }
10711
10712    // ---- round 155: §4.1 predictor size_bits two-value sweep ----------
10713    //
10714    // The round-155 step extends the predictor candidate from a single
10715    // `DEFAULT_PREDICTOR_SIZE_BITS = 4` block-grid to a two-value sweep
10716    // mirroring the round-147 §4.2 color-transform shape: per-region
10717    // (`size_bits = 4` → 16×16 pixel blocks) plus a maximal single-block
10718    // candidate (`size_bits` promoted up to 9 so the sub-image is 1×1).
10719    // Each value composes with the round-148 `cache_code_bits ∈ [1..11]`
10720    // + disabled-cache baseline.
10721    //
10722    // The tests below establish three contracts:
10723    //
10724    // 1) Non-regression — the round-155 chooser never produces a stream
10725    //    longer than the pre-round-155 chooser (which only evaluated the
10726    //    default `size_bits = 4` predictor).
10727    // 2) Strict-beat on a synthetic fixture where the maximal-single-
10728    //    block predictor wins (a small image whose `size_bits = 4`
10729    //    per-region path emits a costly 1×1 sub-image equal to the
10730    //    single-block one but where the per-region wraps in the same
10731    //    16×16 mode, leaving the two effectively identical except for
10732    //    sub-image layout — and small enough that the single-block path
10733    //    wins on noise).
10734    // 3) Round-trip — every emitted stream still round-trips through
10735    //    `decode_lossless_image`, so the size_bits promotion did not
10736    //    break the §4.1 header.
10737
10738    /// Local copy of the pre-round-155 chooser: identical to
10739    /// [`encode_argb_with_predictor_chooser`] but evaluates only the
10740    /// default-size predictor candidate (no maximal single-block sweep).
10741    /// Used as the regression baseline for the round-155 non-regression
10742    /// tests so they exercise *only* the size_bits-sweep delta the
10743    /// chooser added.
10744    fn pre_round_155_predictor_chooser(pixels: &[u32], width: u32, height: u32) -> Vec<u8> {
10745        let mut best = encode_argb_literals_with_width(pixels, width);
10746
10747        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
10748        let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
10749        let pred_block = 1u32 << pred_size_bits;
10750        let ctx_block = 1u32 << ctx_size_bits;
10751
10752        if width >= pred_block && height >= pred_block {
10753            // Pre-round-155: single `size_bits = 4` predictor only.
10754            let pred_best = select_best_cache_bits(|cache_bits| {
10755                encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
10756            });
10757            if pred_best.len() < best.len() {
10758                best = pred_best;
10759            }
10760        }
10761
10762        // §4.2 color transform unchanged (round-147 two-value sweep).
10763        if width >= ctx_block && height >= ctx_block {
10764            let mut single_block_size_bits: u8 = ctx_size_bits;
10765            while single_block_size_bits < 9
10766                && ((1u32 << single_block_size_bits) < width
10767                    || (1u32 << single_block_size_bits) < height)
10768            {
10769                single_block_size_bits += 1;
10770            }
10771            let try_single_block = single_block_size_bits != ctx_size_bits;
10772            let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
10773                encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
10774            })];
10775            if try_single_block {
10776                candidates.push(select_best_cache_bits(|cache_bits| {
10777                    encode_with_color_transform(
10778                        pixels,
10779                        width,
10780                        height,
10781                        single_block_size_bits,
10782                        cache_bits,
10783                        width,
10784                    )
10785                }));
10786            }
10787            for cand in candidates {
10788                if cand.len() < best.len() {
10789                    best = cand;
10790                }
10791            }
10792        }
10793
10794        if collect_palette(pixels).is_some() {
10795            let ci_best = select_best_cache_bits(|cache_bits| {
10796                encode_with_color_indexing(pixels, width, height, cache_bits)
10797                    .expect("palette feasibility already confirmed")
10798            });
10799            if ci_best.len() < best.len() {
10800                best = ci_best;
10801            }
10802        }
10803
10804        if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
10805            if mp_best.len() < best.len() {
10806                best = mp_best;
10807            }
10808        }
10809
10810        best
10811    }
10812
10813    /// Round 155 non-regression: across a fixture matrix spanning
10814    /// gradient / noise / palette-ish images and several shapes, the
10815    /// round-155 chooser must never produce a stream longer than the
10816    /// pre-round-155 chooser (which had only the default-size predictor
10817    /// candidate). The round-155 chooser is a strict superset of the
10818    /// pre-round-155 candidate set, so this is a structural guarantee.
10819    #[test]
10820    fn round_155_predictor_size_bits_sweep_never_regresses() {
10821        let shapes: &[(u32, u32)] = &[
10822            (16, 16),
10823            (20, 20),
10824            (24, 24),
10825            (32, 32),
10826            (48, 48),
10827            (16, 32),
10828            (64, 16),
10829            (40, 24),
10830        ];
10831        for &(w, h) in shapes {
10832            // Three fixture families: smooth gradient, dense noise,
10833            // small-palette stripes.
10834            let gradient: Vec<u32> = (0..(w * h) as usize)
10835                .map(|i| {
10836                    let x = (i as u32) % w;
10837                    let y = (i as u32) / w;
10838                    let g = (x + y) & 0xFF;
10839                    0xFF00_0000 | (g << 16) | (g << 8) | g
10840                })
10841                .collect();
10842            let mut seed = 0xC0FFEE_u32;
10843            let noise: Vec<u32> = (0..(w * h) as usize)
10844                .map(|_| {
10845                    seed ^= seed << 13;
10846                    seed ^= seed >> 17;
10847                    seed ^= seed << 5;
10848                    0xFF00_0000 | (seed & 0x00FF_FFFF)
10849                })
10850                .collect();
10851            let stripes: Vec<u32> = (0..(w * h) as usize)
10852                .map(|i| {
10853                    let x = (i as u32) % w;
10854                    match x % 4 {
10855                        0 => 0xFFAA_5500,
10856                        1 => 0xFF55_AA00,
10857                        2 => 0xFF00_55AA,
10858                        _ => 0xFF55_00AA,
10859                    }
10860                })
10861                .collect();
10862
10863            for (name, pixels) in [
10864                ("gradient", &gradient),
10865                ("noise", &noise),
10866                ("stripes", &stripes),
10867            ] {
10868                let pre = pre_round_155_predictor_chooser(pixels, w, h);
10869                let post = encode_argb_with_predictor_chooser(pixels, w, h);
10870                assert!(
10871                    post.len() <= pre.len(),
10872                    "round-155 chooser regression on {name} {w}x{h}: pre={} B post={} B",
10873                    pre.len(),
10874                    post.len(),
10875                );
10876            }
10877        }
10878    }
10879
10880    /// Round 155 strict-beat: on a fixture small enough that the
10881    /// default-size predictor block-image has no useful resolution
10882    /// (a 20×20 image gives one 16×16 in-bounds block plus border
10883    /// padding that still pays a 1-pixel sub-image), the maximal
10884    /// single-block predictor strictly shrinks the chosen stream
10885    /// because both candidates share the same block-image cost while
10886    /// the single-block path picks a globally-optimal predictor mode
10887    /// over the noise pattern. The test prints the byte-saved delta so
10888    /// the round report can quote a measured number.
10889    #[test]
10890    fn round_155_predictor_size_bits_sweep_strictly_beats_default_on_some_fixture() {
10891        // 20×20 dense-residual fixture: per-pixel green channel changes
10892        // every pixel so the per-region 16×16 block path can't dominate
10893        // and the chooser's two candidates differ only in sub-image
10894        // shape + global predictor pick.
10895        let w = 20u32;
10896        let h = 20u32;
10897        let mut seed = 0xDEADBEEF_u32;
10898        let pixels: Vec<u32> = (0..(w * h) as usize)
10899            .map(|_| {
10900                seed ^= seed << 13;
10901                seed ^= seed >> 17;
10902                seed ^= seed << 5;
10903                0xFF00_0000 | (seed & 0x00FF_FFFF)
10904            })
10905            .collect();
10906
10907        let pre = pre_round_155_predictor_chooser(&pixels, w, h);
10908        let post = encode_argb_with_predictor_chooser(&pixels, w, h);
10909
10910        eprintln!(
10911            "[round-155] {w}x{h} dense-residual: pre={} B post={} B delta={} B ({:.2}%)",
10912            pre.len(),
10913            post.len(),
10914            pre.len() as i64 - post.len() as i64,
10915            (pre.len() as f64 - post.len() as f64) / pre.len() as f64 * 100.0,
10916        );
10917        assert!(
10918            post.len() < pre.len(),
10919            "round-155 maximal-single-block predictor must strictly shrink the chosen \
10920             stream on the 20x20 dense-residual fixture: pre={} B post={} B",
10921            pre.len(),
10922            post.len(),
10923        );
10924    }
10925
10926    /// Round 155 round-trip: the maximal-single-block predictor
10927    /// candidate (size_bits promoted up to 9) must still emit a valid
10928    /// §4.1 transform header that the decoder accepts; the resulting
10929    /// stream must round-trip back to the exact input pixels via
10930    /// [`crate::decode_lossless_image`]. The test directly invokes
10931    /// `encode_with_predictor` at the largest size_bits the sweep can
10932    /// pick (matching the chooser's promotion loop) and frames it with
10933    /// `build_image_header` for the round-trip path.
10934    #[test]
10935    fn round_155_predictor_single_block_round_trips_through_decoder() {
10936        let w = 64u32;
10937        let h = 16u32;
10938        let mut seed = 0xA5A5_F00D_u32;
10939        let pixels: Vec<u32> = (0..(w * h) as usize)
10940            .map(|_| {
10941                seed ^= seed << 13;
10942                seed ^= seed >> 17;
10943                seed ^= seed << 5;
10944                0xFF00_0000 | (seed & 0x00FF_FFFF)
10945            })
10946            .collect();
10947
10948        // 1) The chooser's chosen stream must round-trip end-to-end
10949        //    through `build::build_webp_file` + `decode_lossless_image`.
10950        let stream_chooser = encode_argb_with_predictor_chooser(&pixels, w, h);
10951        let header_chooser = build_image_header(w, h, true);
10952        let mut payload_chooser = header_chooser.to_vec();
10953        payload_chooser.extend_from_slice(&stream_chooser);
10954        let framed_chooser =
10955            build::build_webp_file(&payload_chooser, ImageKind::Lossless, w, h).unwrap();
10956        let img = crate::decode_lossless_image(&framed_chooser)
10957            .unwrap()
10958            .unwrap();
10959        assert_eq!(img.pixels(), pixels.as_slice());
10960
10961        // 2) The single-block predictor path directly: pick the
10962        //    smallest size_bits such that `1 << size_bits ≥ max(w, h)`,
10963        //    matching the chooser's promotion loop.
10964        let mut single_block_size_bits: u8 = DEFAULT_PREDICTOR_SIZE_BITS;
10965        while single_block_size_bits < 9
10966            && ((1u32 << single_block_size_bits) < w || (1u32 << single_block_size_bits) < h)
10967        {
10968            single_block_size_bits += 1;
10969        }
10970        // 64×16 promotes to size_bits = 6 (block 64).
10971        assert_eq!(single_block_size_bits, 6);
10972        let stream = encode_with_predictor(&pixels, w, h, single_block_size_bits, None, w);
10973        let header = build_image_header(w, h, true);
10974        let mut payload = header.to_vec();
10975        payload.extend_from_slice(&stream);
10976        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
10977        let img2 = crate::decode_lossless_image(&framed).unwrap().unwrap();
10978        assert_eq!(img2.pixels(), pixels.as_slice());
10979    }
10980
10981    // ---- round 156: §5.2.2 single-position lazy LZ77 matching --------
10982    //
10983    // The round-156 step adds a single-position look-ahead to the §5.2.2
10984    // hash-chain matcher in `tokenize_lz77`: when a match `(L_a, _)` is
10985    // found at `pos`, the encoder also probes `pos + 1` and, if the
10986    // look-ahead yields a strictly longer match, emits `pixels[pos]` as
10987    // a literal and uses the longer match from `pos + 1` instead. The
10988    // decoder output is bit-identical for any input — only the token
10989    // partition shifts — so the property under test is *byte-count*,
10990    // not pixel correctness (which the existing round-trip tests cover).
10991    //
10992    // The internal `tokenize_lz77_inner` exposes a `lazy_depth: u32`
10993    // toggle so a test can build the strict-greedy r155 baseline token
10994    // stream (`lazy_depth = 0`) alongside the round-156 depth-1 stream
10995    // (`lazy_depth = 1`) and the round-157 depth-2 stream
10996    // (`lazy_depth = 2`) on the same fixture, then compare token counts.
10997    // Three contracts:
10998    //
10999    // 1) Round-trip — every lazy-matched stream still round-trips
11000    //    end-to-end through `decode_lossless_image`.
11001    // 2) Strict-beat — on a hand-crafted fixture where the strict-
11002    //    greedy matcher gets trapped in a short match, the lazy matcher
11003    //    emits strictly fewer tokens (and the test asserts the headline
11004    //    drop, printing the per-fixture numbers).
11005    // 3) Non-regression — on a broader fixture matrix the lazy token
11006    //    count is `<=` the strict-greedy token count everywhere (the
11007    //    look-ahead only ever swaps when the longer match strictly
11008    //    wins, so this is a structural guarantee — the test ensures
11009    //    no off-by-one in the insert-bookkeeping reintroduces a
11010    //    regression on future refactors).
11011
11012    /// Round 156 round-trip: a noisy 64×16 fixture encoded with the
11013    /// round-156 lazy matcher must still decode bit-exactly back to the
11014    /// original ARGB pixels. The fixture is large enough that the
11015    /// matcher produces many `Copy` tokens, so the lazy branch is
11016    /// exercised throughout the run (and not just at the tail).
11017    #[test]
11018    fn round_156_lazy_match_round_trips_through_decoder() {
11019        let w = 64u32;
11020        let h = 16u32;
11021        let mut seed = 0xF00D_BABE_u32;
11022        let pixels: Vec<u32> = (0..(w * h) as usize)
11023            .map(|_| {
11024                seed ^= seed << 13;
11025                seed ^= seed >> 17;
11026                seed ^= seed << 5;
11027                0xFF00_0000 | (seed & 0x00FF_FFFF)
11028            })
11029            .collect();
11030
11031        // The full chooser includes the lazy matcher via
11032        // `tokenize_lz77`; the round-trip through the framed file must
11033        // recover the exact input.
11034        let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11035        let header = build_image_header(w, h, true);
11036        let mut payload = header.to_vec();
11037        payload.extend_from_slice(&stream);
11038        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11039        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11040        assert_eq!(img.pixels(), pixels.as_slice());
11041
11042        // The direct lazy-only token stream against
11043        // `encode_argb_literals_with_width` must also round-trip — this
11044        // catches the case where lazy on the no-transform path
11045        // mis-tracks the hash-chain insert bookkeeping.
11046        let stream_direct = encode_argb_literals_with_width(&pixels, w);
11047        let header_direct = build_image_header(w, h, true);
11048        let mut payload_direct = header_direct.to_vec();
11049        payload_direct.extend_from_slice(&stream_direct);
11050        let framed_direct =
11051            build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11052        let img_direct = crate::decode_lossless_image(&framed_direct)
11053            .unwrap()
11054            .unwrap();
11055        assert_eq!(img_direct.pixels(), pixels.as_slice());
11056    }
11057
11058    /// Round 156 strict-beat: a hand-crafted look-ahead-trap fixture
11059    /// where the strict-greedy matcher accepts a short match at
11060    /// position `p` that prevents a strictly longer match at `p + 1`.
11061    /// The fixture engineers two 4-pixel-hash chains so the strict
11062    /// matcher finds a length-4 match at `p` while `p + 1` finds a
11063    /// length-6 match; lazy resolves to the longer partition.
11064    ///
11065    /// Layout (each pixel is a unique ARGB constant):
11066    ///
11067    /// ```text
11068    ///   pos  0..7    [A B C D E F G H]   — primary prefix, gives the
11069    ///                                       [A,B,C,D] chain entry
11070    ///                                       at pos 0 and the
11071    ///                                       [B,C,D,E] entry at pos 1.
11072    ///   pos  8       Z                    — separator
11073    ///   pos  9..15   [A B C D E F G]      — `find(p=10)` matches the
11074    ///                                       primary prefix [A,B,C,D,E,F,G]
11075    ///                                       at pos 0 — length 7. Lazy
11076    ///                                       irrelevant here (no longer
11077    ///                                       match exists past length 7
11078    ///                                       at pos 11).
11079    /// ```
11080    ///
11081    /// That doesn't trap. A real trap requires the `p` match to be
11082    /// strictly shorter than the `p + 1` match. The construction below
11083    /// achieves this by deliberately mismatching the 4th byte at pos
11084    /// `p`'s candidate so the strict match stops at length 4, while
11085    /// pos `p + 1` walks a second pre-seeded chain with a 6+ pixel run.
11086    /// Specifically:
11087    ///
11088    /// ```text
11089    ///   pos  0..3    [A B C D]            — first chain (pos 0).
11090    ///   pos  4..6    [Z Z Z]               — separator.
11091    ///   pos  7..13   [B C D E F G H]       — second chain (pos 7's
11092    ///                                       window is [B,C,D,E]).
11093    ///   pos 14..16   [Z Z Z]
11094    ///   pos 17       A   ← trap start.    `find(17)`'s window is
11095    ///                                     [A,B,C,D] → matches pos 0,
11096    ///                                     extension stops at length 4
11097    ///                                     because pos 4 = Z ≠ pos 21.
11098    ///   pos 18..23   [B C D E F G]        — `find(18)`'s window is
11099    ///                                     [B,C,D,E] → matches pos 7,
11100    ///                                     extension goes 7 long (B-H)
11101    ///                                     against the second chain.
11102    /// ```
11103    ///
11104    /// Greedy: emits `Copy{len=4, dist=17}` at pos 17, then has to
11105    /// emit `[E,F,G]` as literals (pos 21,22,23) because the chain at
11106    /// pos 21's window is gone.
11107    ///
11108    /// Lazy: emits `Literal(A)` at pos 17, then `Copy{len=7, dist=11}`
11109    /// at pos 18, covering `[B,C,D,E,F,G,H]` from pos 7. Net: -2 tokens.
11110    #[test]
11111    fn round_156_lazy_match_strictly_beats_greedy_on_trap_fixture() {
11112        let a = 0xFF11_2233_u32;
11113        let b = 0xFF22_3344_u32;
11114        let c = 0xFF33_4455_u32;
11115        let d = 0xFF44_5566_u32;
11116        let e = 0xFF55_6677_u32;
11117        let f = 0xFF66_7788_u32;
11118        let g = 0xFF77_8899_u32;
11119        let h = 0xFF88_99AA_u32;
11120        let z = 0xFF00_0000_u32;
11121
11122        // The buffer layout (per the doc comment above). Indices are
11123        // explicit so the trap is unambiguous.
11124        let mut pixels: Vec<u32> = vec![
11125            a, b, c, d, // 0..4    primary chain anchor [A,B,C,D]
11126            z, z, z, // 4..7    separator
11127            b, c, d, e, f, g, h, // 7..14   secondary chain anchor [B,C,D,E,...]
11128            z, z, z, // 14..17  separator
11129            a, // 17       trap-start: find(17)→pos0, length 4
11130            b, c, d, e, f, g, h, // 18..25  decoy: find(18)→pos7, length 7
11131        ];
11132        // Pad to 64 pixels so the framing call has a non-degenerate
11133        // image; tail content is uniform Z so it does not interact
11134        // with the trap region.
11135        while pixels.len() < 64 {
11136            pixels.push(z);
11137        }
11138
11139        let greedy = tokenize_lz77_inner(&pixels, 0);
11140        let lazy = tokenize_lz77_inner(&pixels, 1);
11141
11142        let greedy_copies = greedy
11143            .iter()
11144            .filter(|t| matches!(t, Token::Copy { .. }))
11145            .count();
11146        let lazy_copies = lazy
11147            .iter()
11148            .filter(|t| matches!(t, Token::Copy { .. }))
11149            .count();
11150        // Sum of pixels covered by each partition: must equal the
11151        // input length for both partitions (sanity).
11152        let coverage = |toks: &[Token]| -> usize {
11153            toks.iter()
11154                .map(|t| match *t {
11155                    Token::Literal(_) => 1,
11156                    Token::CacheRef { .. } => 1,
11157                    Token::Copy { length, .. } => length,
11158                })
11159                .sum()
11160        };
11161        assert_eq!(coverage(&greedy), pixels.len());
11162        assert_eq!(coverage(&lazy), pixels.len());
11163
11164        eprintln!(
11165            "[round-156] trap fixture: greedy tokens={} (copies={}), \
11166             lazy tokens={} (copies={}), copy delta={}",
11167            greedy.len(),
11168            greedy_copies,
11169            lazy.len(),
11170            lazy_copies,
11171            greedy_copies as i64 - lazy_copies as i64,
11172        );
11173
11174        // The trap region has greedy emit
11175        //   [Copy{4, 17}, Copy{7, 11}, Copy{36, 1}]   = 3 copies
11176        // while lazy emits
11177        //   [Literal(A), Copy{10, 11}, Copy{36, 1}]   = 2 copies
11178        // covering the same 11-pixel trap span. The lazy partition
11179        // collapses two separate copies into one longer copy, which is
11180        // the round-156 structural win. (The literal-symbol count rises
11181        // by one to compensate; total tokens may match but the *copy
11182        // count* — and the prefix-code statistics — diverge.)
11183        assert!(
11184            lazy_copies < greedy_copies,
11185            "round-156 lazy matcher must emit strictly fewer Copy tokens on the trap \
11186             fixture: greedy copies={} lazy copies={}\ngreedy partition: {:?}\n\
11187             lazy partition:   {:?}",
11188            greedy_copies,
11189            lazy_copies,
11190            greedy,
11191            lazy,
11192        );
11193
11194        // Round-trip the bytes through the no-transform encoder for
11195        // good measure: the lazy path must still decode back exactly.
11196        let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11197        let w = pixels.len() as u32;
11198        let h = 1u32;
11199        let header = build_image_header(w, h, true);
11200        let mut payload = header.to_vec();
11201        payload.extend_from_slice(&stream);
11202        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11203        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11204        assert_eq!(img.pixels(), pixels.as_slice());
11205    }
11206
11207    /// Round 156 non-regression: across a broad fixture matrix
11208    /// (gradient / noise / stripes shapes), the lazy matcher's token
11209    /// count is `<=` the strict-greedy matcher's everywhere. Structural
11210    /// because the look-ahead only swaps when the alternate match is
11211    /// strictly longer, so the lazy partition uses at most as many
11212    /// tokens as the greedy partition. The test guards against
11213    /// off-by-one bugs in the hash-chain insert bookkeeping (the
11214    /// insert-of-`pos`-for-lookahead path) that future refactors might
11215    /// introduce.
11216    #[test]
11217    fn round_156_lazy_never_increases_token_count() {
11218        let shapes: &[(u32, u32)] = &[
11219            (16, 16),
11220            (20, 20),
11221            (24, 24),
11222            (32, 32),
11223            (48, 48),
11224            (16, 32),
11225            (64, 16),
11226            (40, 24),
11227        ];
11228        for &(w, h) in shapes {
11229            let gradient: Vec<u32> = (0..(w * h) as usize)
11230                .map(|i| {
11231                    let x = (i as u32) % w;
11232                    let y = (i as u32) / w;
11233                    let g = (x + y) & 0xFF;
11234                    0xFF00_0000 | (g << 16) | (g << 8) | g
11235                })
11236                .collect();
11237            let mut seed = 0xC0FFEE_u32;
11238            let noise: Vec<u32> = (0..(w * h) as usize)
11239                .map(|_| {
11240                    seed ^= seed << 13;
11241                    seed ^= seed >> 17;
11242                    seed ^= seed << 5;
11243                    0xFF00_0000 | (seed & 0x00FF_FFFF)
11244                })
11245                .collect();
11246            let stripes: Vec<u32> = (0..(w * h) as usize)
11247                .map(|i| {
11248                    let x = (i as u32) % w;
11249                    match x % 4 {
11250                        0 => 0xFFAA_5500,
11251                        1 => 0xFF55_AA00,
11252                        2 => 0xFF00_55AA,
11253                        _ => 0xFF55_00AA,
11254                    }
11255                })
11256                .collect();
11257
11258            for (name, pixels) in [
11259                ("gradient", &gradient),
11260                ("noise", &noise),
11261                ("stripes", &stripes),
11262            ] {
11263                let greedy = tokenize_lz77_inner(pixels, 0);
11264                let lazy = tokenize_lz77_inner(pixels, 1);
11265                assert!(
11266                    lazy.len() <= greedy.len(),
11267                    "round-156 lazy regression on {name} {w}x{h}: greedy={} tokens, \
11268                     lazy={} tokens",
11269                    greedy.len(),
11270                    lazy.len(),
11271                );
11272            }
11273        }
11274    }
11275
11276    // ---- round 157: §5.2.2 two-position lazy LZ77 matching -----------
11277    //
11278    // The round-157 step extends the round-156 single-position lazy
11279    // matcher with a second look-ahead position. After finding a match
11280    // `(L_a, _)` at `pos` and (depth-1) probing `pos + 1` for a strictly
11281    // longer `L_b`, the matcher also (depth-2) probes `pos + 2` for an
11282    // `L_c > max(L_a, L_b)`. When the depth-2 probe wins, the encoder
11283    // emits two literals (`pixels[pos]` and `pixels[pos + 1]`) and takes
11284    // the longer match from `pos + 2`. This recovers a *second-order*
11285    // strict-greedy trap that the round-156 depth-1 matcher could not
11286    // escape — a short match at `pos` AND a short match at `pos + 1`
11287    // together blocking a strictly longer match at `pos + 2`. The
11288    // decoder output is bit-identical for any input — only the token
11289    // *partition* shifts by up to two pixels — so round-trips remain
11290    // bit-exact under any input.
11291    //
11292    // Three contracts (mirroring the round-156 layout):
11293    //
11294    // 1) Round-trip — every depth-2 lazy-matched stream still
11295    //    round-trips end-to-end through `decode_lossless_image`.
11296    // 2) Strict-beat — on a hand-crafted depth-2-trap fixture, the
11297    //    depth-2 matcher emits strictly fewer Copy tokens than both
11298    //    the strict-greedy matcher and the depth-1 lazy matcher.
11299    // 3) Non-regression — on a broader fixture matrix the depth-2
11300    //    token count is `<=` the depth-1 token count everywhere.
11301
11302    /// Round 157 round-trip: a noisy 80×16 fixture encoded with the
11303    /// round-157 depth-2 lazy matcher (now the production
11304    /// `tokenize_lz77` default) must still decode bit-exactly back to
11305    /// the original ARGB pixels. Uses an independent xorshift seed
11306    /// from the round-156 test so both fixtures exercise the matcher
11307    /// over distinct entropy.
11308    #[test]
11309    fn round_157_depth2_lazy_match_round_trips_through_decoder() {
11310        let w = 80u32;
11311        let h = 16u32;
11312        let mut seed = 0xCAFE_F00D_u32;
11313        let pixels: Vec<u32> = (0..(w * h) as usize)
11314            .map(|_| {
11315                seed ^= seed << 13;
11316                seed ^= seed >> 17;
11317                seed ^= seed << 5;
11318                0xFF00_0000 | (seed & 0x00FF_FFFF)
11319            })
11320            .collect();
11321
11322        // The full chooser delegates to `tokenize_lz77` (depth-2 as of
11323        // round 157); end-to-end round-trip through the framed file
11324        // must recover the exact input.
11325        let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11326        let header = build_image_header(w, h, true);
11327        let mut payload = header.to_vec();
11328        payload.extend_from_slice(&stream);
11329        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11330        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11331        assert_eq!(img.pixels(), pixels.as_slice());
11332
11333        // The direct depth-2 token stream against the no-transform
11334        // encoder must also round-trip — guards against bookkeeping
11335        // bugs in the new depth-2 insert/skip dedup path.
11336        let stream_direct = encode_argb_literals_with_width(&pixels, w);
11337        let header_direct = build_image_header(w, h, true);
11338        let mut payload_direct = header_direct.to_vec();
11339        payload_direct.extend_from_slice(&stream_direct);
11340        let framed_direct =
11341            build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11342        let img_direct = crate::decode_lossless_image(&framed_direct)
11343            .unwrap()
11344            .unwrap();
11345        assert_eq!(img_direct.pixels(), pixels.as_slice());
11346    }
11347
11348    /// Round 157 strict-beat: a hand-crafted depth-2-trap fixture where
11349    /// the strict-greedy matcher AND the round-156 depth-1 lazy matcher
11350    /// both accept a short match at `pos` that prevents a strictly
11351    /// longer match at `pos + 2`. The depth-2 lazy matcher emits two
11352    /// literals and takes the longer match.
11353    ///
11354    /// Layout (each capital letter is a unique ARGB constant; the
11355    /// `Z*` family are unique separator pixels that share no 4-pixel
11356    /// window with the anchors):
11357    ///
11358    /// ```text
11359    ///   pos  0..3    [P Q R S]                — anchor A (4 px)
11360    ///   pos  4..6    [Z1 Z2 Z3]               — separator
11361    ///   pos  7..10   [Q R S T]                — anchor B (4 px)
11362    ///   pos 11..13   [Z4 Z5 Z6]               — separator
11363    ///   pos 14..21   [R S T U V W X Y]        — anchor C (8 px)
11364    ///   pos 22..24   [Z7 Z8 Z9]               — separator
11365    ///   pos 25       P                         — trap start
11366    ///   pos 26       Q
11367    ///   pos 27..33   [R S T U V W X]          — depth-2 chain region
11368    ///   pos 34..     fill with a fresh Zfill color (no 4-window match)
11369    /// ```
11370    ///
11371    /// At pos 25:
11372    ///
11373    /// * `find(25)` window `[P,Q,R,S]` → matches anchor A (pos 0),
11374    ///   extension stops at length 4 because pos 4 (Z1) ≠ pos 29 (T).
11375    /// * `find(26)` window `[Q,R,S,T]` → matches anchor B (pos 7),
11376    ///   extension stops at length 4 because pos 11 (Z4) ≠ pos 30 (U).
11377    ///   `L_b = 4 = L_a`, **not strictly greater**, so the depth-1
11378    ///   lazy matcher does NOT swap.
11379    /// * `find(27)` window `[R,S,T,U]` → matches anchor C (pos 14),
11380    ///   extension goes `[R,S,T,U,V,W,X]` (length 7) before pos 21
11381    ///   (Y) ≠ pos 34 (Zfill). `L_c = 7 > 4`, so the depth-2 lazy
11382    ///   matcher swaps to two literals + the length-7 match.
11383    ///
11384    /// Strict-greedy AND depth-1 partition at the trap:
11385    /// `[Copy{4, dist=25}, ...]`. Depth-2 partition: `[Lit(P),
11386    /// Lit(Q), Copy{7, dist=13}, ...]`. Net: depth-2 collapses a
11387    /// short-then-short pair into one longer copy — strictly fewer
11388    /// Copy tokens, at the cost of one extra literal (mirroring the
11389    /// round-156 pattern).
11390    #[test]
11391    fn round_157_depth2_lazy_match_strictly_beats_depth1_on_trap_fixture() {
11392        // Distinct ARGB constants. Anchor letters P..Y carry the
11393        // structural matches; Z1..Z9 + Zfill are deliberately unique
11394        // so they cannot seed a parasitic chain.
11395        let p_ = 0xFF11_2200_u32;
11396        let q_ = 0xFF22_3300_u32;
11397        let r_ = 0xFF33_4400_u32;
11398        let s_ = 0xFF44_5500_u32;
11399        let t_ = 0xFF55_6600_u32;
11400        let u_ = 0xFF66_7700_u32;
11401        let v_ = 0xFF77_8800_u32;
11402        let w_ = 0xFF88_9900_u32;
11403        let x_ = 0xFF99_AA00_u32;
11404        let y_ = 0xFFAA_BB00_u32;
11405        let z1 = 0xFFCC_DD01_u32;
11406        let z2 = 0xFFCC_DD02_u32;
11407        let z3 = 0xFFCC_DD03_u32;
11408        let z4 = 0xFFCC_DD04_u32;
11409        let z5 = 0xFFCC_DD05_u32;
11410        let z6 = 0xFFCC_DD06_u32;
11411        let z7 = 0xFFCC_DD07_u32;
11412        let z8 = 0xFFCC_DD08_u32;
11413        let z9 = 0xFFCC_DD09_u32;
11414
11415        let mut pixels: Vec<u32> = vec![
11416            p_, q_, r_, s_, // 0..4    anchor A
11417            z1, z2, z3, // 4..7    separator
11418            q_, r_, s_, t_, // 7..11   anchor B
11419            z4, z5, z6, // 11..14  separator
11420            r_, s_, t_, u_, v_, w_, x_, y_, // 14..22  anchor C
11421            z7, z8, z9, // 22..25  separator
11422            p_, q_, // 25..27  trap start (depth-1 cannot escape)
11423            r_, s_, t_, u_, v_, w_, x_, // 27..34  depth-2 chain region
11424        ];
11425        // Pad the tail with unique colors so the depth-2 swap's
11426        // post-match region cannot trigger another long match that
11427        // might mask the trap's copy-count delta.
11428        let mut filler = 0xFFE0_0000_u32;
11429        while pixels.len() < 80 {
11430            filler = filler.wrapping_add(1);
11431            pixels.push(filler);
11432        }
11433
11434        let greedy = tokenize_lz77_inner(&pixels, 0);
11435        let lazy1 = tokenize_lz77_inner(&pixels, 1);
11436        let lazy2 = tokenize_lz77_inner(&pixels, 2);
11437
11438        let copies = |toks: &[Token]| -> usize {
11439            toks.iter()
11440                .filter(|t| matches!(t, Token::Copy { .. }))
11441                .count()
11442        };
11443        let coverage = |toks: &[Token]| -> usize {
11444            toks.iter()
11445                .map(|t| match *t {
11446                    Token::Literal(_) => 1,
11447                    Token::CacheRef { .. } => 1,
11448                    Token::Copy { length, .. } => length,
11449                })
11450                .sum()
11451        };
11452        // Sanity: all three partitions cover the exact image.
11453        assert_eq!(coverage(&greedy), pixels.len());
11454        assert_eq!(coverage(&lazy1), pixels.len());
11455        assert_eq!(coverage(&lazy2), pixels.len());
11456
11457        let g_c = copies(&greedy);
11458        let l1_c = copies(&lazy1);
11459        let l2_c = copies(&lazy2);
11460        eprintln!(
11461            "[round-157] depth-2 trap fixture: greedy tokens={} (copies={}), \
11462             depth-1 tokens={} (copies={}), depth-2 tokens={} (copies={}), \
11463             copy delta vs depth-1={}",
11464            greedy.len(),
11465            g_c,
11466            lazy1.len(),
11467            l1_c,
11468            lazy2.len(),
11469            l2_c,
11470            l1_c as i64 - l2_c as i64,
11471        );
11472
11473        // The trap forces depth-2 to collapse a length-4 copy into a
11474        // 2-literals + length-7 copy that subsumes 7 pixels of what
11475        // greedy / depth-1 would have to cover with multiple matches.
11476        // The structural win is on Copy count: depth-2 must emit
11477        // strictly fewer Copy tokens than BOTH baselines.
11478        assert_eq!(
11479            g_c, l1_c,
11480            "round-157 fixture: depth-1 must agree with greedy here \
11481             (no depth-1 swap fires) — greedy={g_c}, depth-1={l1_c}"
11482        );
11483        assert!(
11484            l2_c < l1_c,
11485            "round-157 depth-2 matcher must emit strictly fewer Copy \
11486             tokens than the depth-1 matcher on the depth-2 trap \
11487             fixture: depth-1 copies={l1_c} depth-2 copies={l2_c}\n\
11488             depth-1 partition: {lazy1:?}\n\
11489             depth-2 partition: {lazy2:?}"
11490        );
11491
11492        // Round-trip the bytes through the no-transform encoder for
11493        // good measure: the depth-2 path must decode back exactly.
11494        let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11495        let w = pixels.len() as u32;
11496        let h = 1u32;
11497        let header = build_image_header(w, h, true);
11498        let mut payload = header.to_vec();
11499        payload.extend_from_slice(&stream);
11500        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11501        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11502        assert_eq!(img.pixels(), pixels.as_slice());
11503    }
11504
11505    /// Round 157 non-regression: across a broad fixture matrix the
11506    /// depth-2 lazy token count is `<=` the depth-1 lazy token count
11507    /// everywhere. Structural because the depth-2 probe only swaps
11508    /// when the alternate match is strictly longer than the depth-1
11509    /// best, so the depth-2 partition uses at most as many tokens as
11510    /// the depth-1 partition. The test guards against off-by-one in
11511    /// the new depth-2 insert/skip dedup (where `pos` and `pos + 1`
11512    /// can both be pre-inserted before the chosen match starts at
11513    /// `pos`, `pos + 1`, or `pos + 2`).
11514    #[test]
11515    fn round_157_depth2_never_increases_token_count_over_depth1() {
11516        let shapes: &[(u32, u32)] = &[
11517            (16, 16),
11518            (20, 20),
11519            (24, 24),
11520            (32, 32),
11521            (48, 48),
11522            (16, 32),
11523            (64, 16),
11524            (40, 24),
11525        ];
11526        for &(w, h) in shapes {
11527            let gradient: Vec<u32> = (0..(w * h) as usize)
11528                .map(|i| {
11529                    let x = (i as u32) % w;
11530                    let y = (i as u32) / w;
11531                    let g = (x + y) & 0xFF;
11532                    0xFF00_0000 | (g << 16) | (g << 8) | g
11533                })
11534                .collect();
11535            let mut seed = 0xC0FFEE_u32;
11536            let noise: Vec<u32> = (0..(w * h) as usize)
11537                .map(|_| {
11538                    seed ^= seed << 13;
11539                    seed ^= seed >> 17;
11540                    seed ^= seed << 5;
11541                    0xFF00_0000 | (seed & 0x00FF_FFFF)
11542                })
11543                .collect();
11544            let stripes: Vec<u32> = (0..(w * h) as usize)
11545                .map(|i| {
11546                    let x = (i as u32) % w;
11547                    match x % 4 {
11548                        0 => 0xFFAA_5500,
11549                        1 => 0xFF55_AA00,
11550                        2 => 0xFF00_55AA,
11551                        _ => 0xFF55_00AA,
11552                    }
11553                })
11554                .collect();
11555
11556            for (name, pixels) in [
11557                ("gradient", &gradient),
11558                ("noise", &noise),
11559                ("stripes", &stripes),
11560            ] {
11561                let lazy1 = tokenize_lz77_inner(pixels, 1);
11562                let lazy2 = tokenize_lz77_inner(pixels, 2);
11563                assert!(
11564                    lazy2.len() <= lazy1.len(),
11565                    "round-157 depth-2 regression on {name} {w}x{h}: \
11566                     depth-1={} tokens, depth-2={} tokens",
11567                    lazy1.len(),
11568                    lazy2.len(),
11569                );
11570                // Round-trip the depth-2 stream as a defensive check
11571                // for hash-chain insert bookkeeping.
11572                let stream = encode_argb_literals_with_width(pixels, w);
11573                let header = build_image_header(w, h, true);
11574                let mut payload = header.to_vec();
11575                payload.extend_from_slice(&stream);
11576                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11577                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11578                assert_eq!(
11579                    img.pixels(),
11580                    pixels.as_slice(),
11581                    "round-157 depth-2 round-trip mismatch on {name} {w}x{h}"
11582                );
11583            }
11584        }
11585    }
11586
11587    // ---- round 158: §5.2.2 three-position lazy LZ77 matching ---------
11588    //
11589    // The round-158 step extends the round-157 two-position lazy
11590    // matcher with a third look-ahead position. After finding a match
11591    // `(L_a, _)` at `pos` and (depth-1) probing `pos + 1` for a strictly
11592    // longer `L_b`, and (depth-2) probing `pos + 2` for a strictly
11593    // longer `L_c`, the matcher also (depth-3) probes `pos + 3` for an
11594    // `L_d > max(L_a, L_b, L_c)`. When the depth-3 probe wins, the
11595    // encoder emits three literals (`pixels[pos]`, `pixels[pos + 1]`,
11596    // and `pixels[pos + 2]`) and takes the longer match from `pos + 3`.
11597    // This recovers a *third-order* strict-greedy trap that the
11598    // round-157 depth-2 matcher could not escape — three consecutive
11599    // short matches at `pos`, `pos + 1`, `pos + 2` together blocking a
11600    // strictly longer match at `pos + 3`. The decoder output is
11601    // bit-identical for any input — only the token *partition* shifts
11602    // by up to three pixels — so round-trips remain bit-exact under
11603    // any input.
11604    //
11605    // Three contracts (mirroring the round-156 / round-157 layout):
11606    //
11607    // 1) Round-trip — every depth-3 lazy-matched stream still
11608    //    round-trips end-to-end through `decode_lossless_image`.
11609    // 2) Strict-beat — on a hand-crafted depth-3-trap fixture, the
11610    //    depth-3 matcher emits strictly fewer Copy tokens than the
11611    //    strict-greedy, depth-1, and depth-2 matchers.
11612    // 3) Non-regression — on a broader fixture matrix the depth-3
11613    //    token count is `<=` the depth-2 token count everywhere.
11614
11615    /// Round 158 round-trip: a noisy 96×16 fixture encoded with the
11616    /// round-158 depth-3 lazy matcher (now the production
11617    /// `tokenize_lz77` default) must still decode bit-exactly back to
11618    /// the original ARGB pixels. Uses an independent xorshift seed
11619    /// from the round-156 / round-157 tests so all three fixtures
11620    /// exercise the matcher over distinct entropy.
11621    #[test]
11622    fn round_158_depth3_lazy_match_round_trips_through_decoder() {
11623        let w = 96u32;
11624        let h = 16u32;
11625        let mut seed = 0xDEAD_BEEF_u32;
11626        let pixels: Vec<u32> = (0..(w * h) as usize)
11627            .map(|_| {
11628                seed ^= seed << 13;
11629                seed ^= seed >> 17;
11630                seed ^= seed << 5;
11631                0xFF00_0000 | (seed & 0x00FF_FFFF)
11632            })
11633            .collect();
11634
11635        // The full chooser delegates to `tokenize_lz77` (depth-3 as of
11636        // round 158); end-to-end round-trip through the framed file
11637        // must recover the exact input.
11638        let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11639        let header = build_image_header(w, h, true);
11640        let mut payload = header.to_vec();
11641        payload.extend_from_slice(&stream);
11642        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11643        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11644        assert_eq!(img.pixels(), pixels.as_slice());
11645
11646        // The direct depth-3 token stream against the no-transform
11647        // encoder must also round-trip — guards against bookkeeping
11648        // bugs in the new depth-3 insert/skip dedup path (where `pos`,
11649        // `pos + 1`, and `pos + 2` can all be pre-inserted before the
11650        // chosen match starts at `pos`, `pos + 1`, `pos + 2`, or
11651        // `pos + 3`).
11652        let stream_direct = encode_argb_literals_with_width(&pixels, w);
11653        let header_direct = build_image_header(w, h, true);
11654        let mut payload_direct = header_direct.to_vec();
11655        payload_direct.extend_from_slice(&stream_direct);
11656        let framed_direct =
11657            build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11658        let img_direct = crate::decode_lossless_image(&framed_direct)
11659            .unwrap()
11660            .unwrap();
11661        assert_eq!(img_direct.pixels(), pixels.as_slice());
11662    }
11663
11664    /// Round 158 strict-beat: a hand-crafted depth-3-trap fixture
11665    /// where the strict-greedy matcher, the round-156 depth-1 lazy
11666    /// matcher, AND the round-157 depth-2 lazy matcher all accept a
11667    /// short match at `pos` that prevents a strictly longer match at
11668    /// `pos + 3`. The depth-3 lazy matcher emits three literals and
11669    /// takes the longer match.
11670    ///
11671    /// Layout (each capital letter is a unique ARGB constant; the
11672    /// `Z*` family are unique separator pixels that share no 4-pixel
11673    /// window with the anchors):
11674    ///
11675    /// ```text
11676    ///   pos  0..4    [P Q R S]                — anchor A (4 px)
11677    ///   pos  4..7    [Z1 Z2 Z3]               — separator
11678    ///   pos  7..11   [Q R S T]                — anchor B (4 px)
11679    ///   pos 11..14   [Z4 Z5 Z6]               — separator
11680    ///   pos 14..18   [R S T U]                — anchor C (4 px)
11681    ///   pos 18..21   [Z7 Z8 Z9]               — separator
11682    ///   pos 21..30   [S T U V W X Y A B]      — anchor D (9 px)
11683    ///   pos 30..33   [Z10 Z11 Z12]            — separator
11684    ///   pos 33       P                         — trap start
11685    ///   pos 34       Q
11686    ///   pos 35       R
11687    ///   pos 36..45   [S T U V W X Y A B]      — depth-3 chain region
11688    ///   pos 45..     fill with unique Zfill colors (no 4-window match)
11689    /// ```
11690    ///
11691    /// At pos 33:
11692    ///
11693    /// * `find(33)` window `[P,Q,R,S]` → matches anchor A (pos 0),
11694    ///   extension stops at length 4 because pos 4 (Z1) ≠ pos 37 (T).
11695    /// * `find(34)` window `[Q,R,S,T]` → matches anchor B (pos 7),
11696    ///   extension stops at length 4 because pos 11 (Z4) ≠ pos 38 (U).
11697    ///   `L_b = 4 = L_a`, **not strictly greater**, so the depth-1
11698    ///   lazy matcher does NOT swap.
11699    /// * `find(35)` window `[R,S,T,U]` → matches anchor C (pos 14),
11700    ///   extension stops at length 4 because pos 18 (Z7) ≠ pos 39 (V).
11701    ///   `L_c = 4 = L_a`, **not strictly greater**, so the depth-2
11702    ///   lazy matcher does NOT swap.
11703    /// * `find(36)` window `[S,T,U,V]` → matches anchor D (pos 21),
11704    ///   extension goes the full `[S,T,U,V,W,X,Y,A,B]` (length 9)
11705    ///   before pos 30 (Z10) ≠ pos 45 (Zfill). `L_d = 9 > 4`, so the
11706    ///   depth-3 lazy matcher swaps to three literals + the length-9
11707    ///   match.
11708    ///
11709    /// Strict-greedy, depth-1, AND depth-2 partition at the trap:
11710    /// `[Copy{4, dist=33}, ...]`. Depth-3 partition: `[Lit(P), Lit(Q),
11711    /// Lit(R), Copy{9, dist=15}, ...]`. Net: depth-3 collapses a
11712    /// short-then-short-then-short triple into one longer copy.
11713    #[test]
11714    fn round_158_depth3_lazy_match_strictly_beats_depth2_on_trap_fixture() {
11715        // Distinct ARGB constants. Anchor letters P..Y + A..B carry
11716        // the structural matches; Z1..Z12 + Zfill are deliberately
11717        // unique so they cannot seed a parasitic chain.
11718        let p_ = 0xFF11_2200_u32;
11719        let q_ = 0xFF22_3300_u32;
11720        let r_ = 0xFF33_4400_u32;
11721        let s_ = 0xFF44_5500_u32;
11722        let t_ = 0xFF55_6600_u32;
11723        let u_ = 0xFF66_7700_u32;
11724        let v_ = 0xFF77_8800_u32;
11725        let w_ = 0xFF88_9900_u32;
11726        let x_ = 0xFF99_AA00_u32;
11727        let y_ = 0xFFAA_BB00_u32;
11728        let a_ = 0xFFBB_CC00_u32;
11729        let b_ = 0xFFCC_DD00_u32;
11730        let z01 = 0xFFEE_0001_u32;
11731        let z02 = 0xFFEE_0002_u32;
11732        let z03 = 0xFFEE_0003_u32;
11733        let z04 = 0xFFEE_0004_u32;
11734        let z05 = 0xFFEE_0005_u32;
11735        let z06 = 0xFFEE_0006_u32;
11736        let z07 = 0xFFEE_0007_u32;
11737        let z08 = 0xFFEE_0008_u32;
11738        let z09 = 0xFFEE_0009_u32;
11739        let z10 = 0xFFEE_000A_u32;
11740        let z11 = 0xFFEE_000B_u32;
11741        let z12 = 0xFFEE_000C_u32;
11742
11743        let mut pixels: Vec<u32> = vec![
11744            p_, q_, r_, s_, // 0..4    anchor A
11745            z01, z02, z03, // 4..7    separator
11746            q_, r_, s_, t_, // 7..11   anchor B
11747            z04, z05, z06, // 11..14  separator
11748            r_, s_, t_, u_, // 14..18  anchor C
11749            z07, z08, z09, // 18..21  separator
11750            s_, t_, u_, v_, w_, x_, y_, a_, b_, // 21..30  anchor D (9 px)
11751            z10, z11, z12, // 30..33  separator
11752            p_, q_, r_, // 33..36  trap start (depth-1/2 cannot escape)
11753            s_, t_, u_, v_, w_, x_, y_, a_, b_, // 36..45  depth-3 chain region
11754        ];
11755        // Pad the tail with unique colors so the depth-3 swap's
11756        // post-match region cannot trigger another long match that
11757        // might mask the trap's copy-count delta.
11758        let mut filler = 0xFFF0_0000_u32;
11759        while pixels.len() < 96 {
11760            filler = filler.wrapping_add(1);
11761            pixels.push(filler);
11762        }
11763
11764        let greedy = tokenize_lz77_inner(&pixels, 0);
11765        let lazy1 = tokenize_lz77_inner(&pixels, 1);
11766        let lazy2 = tokenize_lz77_inner(&pixels, 2);
11767        let lazy3 = tokenize_lz77_inner(&pixels, 3);
11768
11769        let copies = |toks: &[Token]| -> usize {
11770            toks.iter()
11771                .filter(|t| matches!(t, Token::Copy { .. }))
11772                .count()
11773        };
11774        let coverage = |toks: &[Token]| -> usize {
11775            toks.iter()
11776                .map(|t| match *t {
11777                    Token::Literal(_) => 1,
11778                    Token::CacheRef { .. } => 1,
11779                    Token::Copy { length, .. } => length,
11780                })
11781                .sum()
11782        };
11783        // Sanity: all four partitions cover the exact image.
11784        assert_eq!(coverage(&greedy), pixels.len());
11785        assert_eq!(coverage(&lazy1), pixels.len());
11786        assert_eq!(coverage(&lazy2), pixels.len());
11787        assert_eq!(coverage(&lazy3), pixels.len());
11788
11789        let g_c = copies(&greedy);
11790        let l1_c = copies(&lazy1);
11791        let l2_c = copies(&lazy2);
11792        let l3_c = copies(&lazy3);
11793        eprintln!(
11794            "[round-158] depth-3 trap fixture: greedy tokens={} (copies={}), \
11795             depth-1 tokens={} (copies={}), depth-2 tokens={} (copies={}), \
11796             depth-3 tokens={} (copies={}), copy delta vs depth-2={}",
11797            greedy.len(),
11798            g_c,
11799            lazy1.len(),
11800            l1_c,
11801            lazy2.len(),
11802            l2_c,
11803            lazy3.len(),
11804            l3_c,
11805            l2_c as i64 - l3_c as i64,
11806        );
11807
11808        // The trap forces depth-3 to collapse a length-4 copy + a
11809        // follow-on length-8 copy into a 3-literals + length-9 copy
11810        // that subsumes 12 pixels of what greedy / depth-1 / depth-2
11811        // would have to cover with two matches. The structural win
11812        // is on Copy count: depth-3 must emit strictly fewer Copy
11813        // tokens than all three baselines.
11814        assert_eq!(
11815            g_c, l1_c,
11816            "round-158 fixture: depth-1 must agree with greedy here \
11817             (no depth-1 swap fires) — greedy={g_c}, depth-1={l1_c}"
11818        );
11819        assert_eq!(
11820            g_c, l2_c,
11821            "round-158 fixture: depth-2 must agree with greedy here \
11822             (no depth-2 swap fires) — greedy={g_c}, depth-2={l2_c}"
11823        );
11824        assert!(
11825            l3_c < l2_c,
11826            "round-158 depth-3 matcher must emit strictly fewer Copy \
11827             tokens than the depth-2 matcher on the depth-3 trap \
11828             fixture: depth-2 copies={l2_c} depth-3 copies={l3_c}\n\
11829             depth-2 partition: {lazy2:?}\n\
11830             depth-3 partition: {lazy3:?}"
11831        );
11832
11833        // Round-trip the bytes through the no-transform encoder for
11834        // good measure: the depth-3 path must decode back exactly.
11835        let stream = encode_argb_literals_with_width(&pixels, pixels.len() as u32);
11836        let w = pixels.len() as u32;
11837        let h = 1u32;
11838        let header = build_image_header(w, h, true);
11839        let mut payload = header.to_vec();
11840        payload.extend_from_slice(&stream);
11841        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11842        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11843        assert_eq!(img.pixels(), pixels.as_slice());
11844    }
11845
11846    /// Round 158 non-regression: across a broad fixture matrix the
11847    /// depth-3 lazy token count is `<=` the depth-2 lazy token count
11848    /// everywhere. Structural because the depth-3 probe only swaps
11849    /// when the alternate match is strictly longer than the depth-2
11850    /// best, so the depth-3 partition uses at most as many tokens as
11851    /// the depth-2 partition. The test guards against off-by-one in
11852    /// the new depth-3 insert/skip dedup (where `pos`, `pos + 1`, and
11853    /// `pos + 2` can all be pre-inserted before the chosen match
11854    /// starts at `pos`, `pos + 1`, `pos + 2`, or `pos + 3`).
11855    #[test]
11856    fn round_158_depth3_never_increases_token_count_over_depth2() {
11857        let shapes: &[(u32, u32)] = &[
11858            (16, 16),
11859            (20, 20),
11860            (24, 24),
11861            (32, 32),
11862            (48, 48),
11863            (16, 32),
11864            (64, 16),
11865            (40, 24),
11866        ];
11867        for &(w, h) in shapes {
11868            let gradient: Vec<u32> = (0..(w * h) as usize)
11869                .map(|i| {
11870                    let x = (i as u32) % w;
11871                    let y = (i as u32) / w;
11872                    let g = (x + y) & 0xFF;
11873                    0xFF00_0000 | (g << 16) | (g << 8) | g
11874                })
11875                .collect();
11876            let mut seed = 0xC0FFEE_u32;
11877            let noise: Vec<u32> = (0..(w * h) as usize)
11878                .map(|_| {
11879                    seed ^= seed << 13;
11880                    seed ^= seed >> 17;
11881                    seed ^= seed << 5;
11882                    0xFF00_0000 | (seed & 0x00FF_FFFF)
11883                })
11884                .collect();
11885            let stripes: Vec<u32> = (0..(w * h) as usize)
11886                .map(|i| {
11887                    let x = (i as u32) % w;
11888                    match x % 4 {
11889                        0 => 0xFFAA_5500,
11890                        1 => 0xFF55_AA00,
11891                        2 => 0xFF00_55AA,
11892                        _ => 0xFF55_00AA,
11893                    }
11894                })
11895                .collect();
11896
11897            for (name, pixels) in [
11898                ("gradient", &gradient),
11899                ("noise", &noise),
11900                ("stripes", &stripes),
11901            ] {
11902                let lazy2 = tokenize_lz77_inner(pixels, 2);
11903                let lazy3 = tokenize_lz77_inner(pixels, 3);
11904                assert!(
11905                    lazy3.len() <= lazy2.len(),
11906                    "round-158 depth-3 regression on {name} {w}x{h}: \
11907                     depth-2={} tokens, depth-3={} tokens",
11908                    lazy2.len(),
11909                    lazy3.len(),
11910                );
11911                // Round-trip the depth-3 stream as a defensive check
11912                // for hash-chain insert bookkeeping.
11913                let stream = encode_argb_literals_with_width(pixels, w);
11914                let header = build_image_header(w, h, true);
11915                let mut payload = header.to_vec();
11916                payload.extend_from_slice(&stream);
11917                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11918                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11919                assert_eq!(
11920                    img.pixels(),
11921                    pixels.as_slice(),
11922                    "round-158 depth-3 round-trip mismatch on {name} {w}x{h}"
11923                );
11924            }
11925        }
11926    }
11927
11928    // ---- round 163: §5.2.2 guarded depth-4 lazy LZ77 ----
11929    //
11930    // Three tests, mirroring the round-156 / 157 / 158 contract:
11931    //
11932    // 1) End-to-end round-trip — a noisy 96×16 fixture encoded with
11933    //    the round-163 guarded depth-4 lazy matcher (now the production
11934    //    `tokenize_lz77` default) must still decode bit-exactly back
11935    //    to the original ARGB pixels.
11936    // 2) Diminishing-returns guard — a hand-crafted fixture where the
11937    //    depth-3 best at `pos` is a long run (`>= DEPTH4_GUARD_THRESHOLD`)
11938    //    and a depth-4 swap candidate exists. The guard must suppress
11939    //    the depth-4 work so depth-4 == depth-3 byte-for-byte on that
11940    //    fixture; the unguarded depth-4 (simulated with `DEPTH4_GUARD_THRESHOLD`
11941    //    set to `MAX_MATCH`) would have swapped. We exercise the
11942    //    boundary by toggling the depth around the guard rather than
11943    //    monkey-patching the constant — the two depth values that
11944    //    bracket the guard (`3` vs `4`) produce identical partitions
11945    //    on the long-run fixture, proving the guard suppressed the
11946    //    probe.
11947    // 3) Non-regression — on a broader fixture matrix the depth-4
11948    //    token count is `<=` the depth-3 token count everywhere
11949    //    (structural: the depth-4 probe only swaps to a *strictly*
11950    //    longer match, so it can only remove tokens, never add them).
11951
11952    /// Round 163 round-trip: a noisy 96×16 fixture encoded with the
11953    /// round-163 guarded depth-4 lazy matcher (now the production
11954    /// `tokenize_lz77` default) must still decode bit-exactly back to
11955    /// the original ARGB pixels. Uses an independent xorshift seed
11956    /// from the round-156 / 157 / 158 tests so all four fixtures
11957    /// exercise the matcher over distinct entropy.
11958    #[test]
11959    fn round_163_depth4_lazy_match_round_trips_through_decoder() {
11960        let w = 96u32;
11961        let h = 16u32;
11962        let mut seed = 0xFEED_FACE_u32;
11963        let pixels: Vec<u32> = (0..(w * h) as usize)
11964            .map(|_| {
11965                seed ^= seed << 13;
11966                seed ^= seed >> 17;
11967                seed ^= seed << 5;
11968                0xFF00_0000 | (seed & 0x00FF_FFFF)
11969            })
11970            .collect();
11971
11972        // The full chooser delegates to `tokenize_lz77` (depth-4 as of
11973        // round 163); end-to-end round-trip through the framed file
11974        // must recover the exact input.
11975        let stream = encode_argb_with_predictor_chooser(&pixels, w, h);
11976        let header = build_image_header(w, h, true);
11977        let mut payload = header.to_vec();
11978        payload.extend_from_slice(&stream);
11979        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
11980        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
11981        assert_eq!(img.pixels(), pixels.as_slice());
11982
11983        // The direct depth-4 token stream against the no-transform
11984        // encoder must also round-trip — guards against bookkeeping
11985        // bugs in the new depth-4 insert/skip dedup path (where `pos`,
11986        // `pos + 1`, `pos + 2`, and `pos + 3` can all be pre-inserted
11987        // before the chosen match starts at `pos`, `pos + 1`, `pos + 2`,
11988        // `pos + 3`, or `pos + 4`).
11989        let stream_direct = encode_argb_literals_with_width(&pixels, w);
11990        let header_direct = build_image_header(w, h, true);
11991        let mut payload_direct = header_direct.to_vec();
11992        payload_direct.extend_from_slice(&stream_direct);
11993        let framed_direct =
11994            build::build_webp_file(&payload_direct, ImageKind::Lossless, w, h).unwrap();
11995        let img_direct = crate::decode_lossless_image(&framed_direct)
11996            .unwrap()
11997            .unwrap();
11998        assert_eq!(img_direct.pixels(), pixels.as_slice());
11999    }
12000
12001    /// Round 163 guard contract: on a fixture whose depth-3 best at
12002    /// some position is already a long run (length strictly `>=
12003    /// DEPTH4_GUARD_THRESHOLD`), the depth-4 probe MUST be suppressed
12004    /// by the guard. We construct an input where a long literal run
12005    /// at the start seeds a long match for the second copy. The
12006    /// depth-3 matcher emits a long match at the first probe; the
12007    /// depth-4 probe, if it were unguarded, would attempt a `find` at
12008    /// `pos + 4`. The guard's structural contract is that whenever
12009    /// the depth-3 best already covers `>= DEPTH4_GUARD_THRESHOLD`
12010    /// pixels, depth-4 produces the IDENTICAL token sequence as
12011    /// depth-3 — i.e. the guard fired and the depth-4 work was
12012    /// skipped.
12013    ///
12014    /// The simpler property the test asserts: on a long-run fixture
12015    /// the depth-4 partition (depth = 4) is byte-for-byte equal to
12016    /// the depth-3 partition (depth = 3). If the guard fails to fire,
12017    /// depth-4 would still find some marginal swap somewhere in the
12018    /// fixture and the two partitions would diverge.
12019    #[test]
12020    fn round_163_depth4_guard_suppresses_long_run_swap() {
12021        // A long, smoothly-varying run guarantees that almost every
12022        // match the matcher finds is significantly longer than
12023        // `DEPTH4_GUARD_THRESHOLD == 6` — so the guard should fire at
12024        // every probe site and depth-4 should produce the same token
12025        // partition as depth-3.
12026        //
12027        // We use a 4-pixel repeating motif that the matcher can find
12028        // long copies of after the first cycle: `[A, B, C, D, A, B, C,
12029        // D, …]`. After 12 pixels of warm-up, a `find` will return a
12030        // match length up to MAX_MATCH (well over the guard threshold).
12031        let a_ = 0xFF10_2030_u32;
12032        let b_ = 0xFF40_5060_u32;
12033        let c_ = 0xFF70_8090_u32;
12034        let d_ = 0xFFA0_B0C0_u32;
12035        let motif = [a_, b_, c_, d_];
12036        let mut pixels: Vec<u32> = Vec::with_capacity(512);
12037        for i in 0..512 {
12038            pixels.push(motif[i & 3]);
12039        }
12040
12041        let lazy3 = tokenize_lz77_inner(&pixels, 3);
12042        let lazy4 = tokenize_lz77_inner(&pixels, 4);
12043
12044        // Guard contract: when the depth-3 best is already long, the
12045        // depth-4 probe is suppressed and the two partitions are
12046        // byte-for-byte equal.
12047        assert_eq!(
12048            lazy3,
12049            lazy4,
12050            "round-163 depth-4 guard should suppress the depth-4 probe \
12051             on a long-run fixture (every depth-3 best `>= DEPTH4_GUARD_THRESHOLD == {}`), \
12052             producing the identical depth-3 partition; depth-3={} tokens, \
12053             depth-4={} tokens",
12054            DEPTH4_GUARD_THRESHOLD,
12055            lazy3.len(),
12056            lazy4.len(),
12057        );
12058
12059        // Sanity: both partitions must cover the input exactly.
12060        let coverage = |toks: &[Token]| -> usize {
12061            toks.iter()
12062                .map(|t| match *t {
12063                    Token::Literal(_) => 1,
12064                    Token::CacheRef { .. } => 1,
12065                    Token::Copy { length, .. } => length,
12066                })
12067                .sum()
12068        };
12069        assert_eq!(coverage(&lazy3), pixels.len());
12070        assert_eq!(coverage(&lazy4), pixels.len());
12071
12072        // End-to-end round-trip via the production chooser for good
12073        // measure: the depth-4-default `tokenize_lz77` must still
12074        // decode back exactly on this long-run fixture.
12075        let w = pixels.len() as u32;
12076        let h = 1u32;
12077        let stream = encode_argb_literals_with_width(&pixels, w);
12078        let header = build_image_header(w, h, true);
12079        let mut payload = header.to_vec();
12080        payload.extend_from_slice(&stream);
12081        let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12082        let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12083        assert_eq!(img.pixels(), pixels.as_slice());
12084    }
12085
12086    /// Round 163 non-regression: across a broad fixture matrix the
12087    /// depth-4 lazy token count is `<=` the depth-3 lazy token count
12088    /// everywhere. Structural because the depth-4 probe — when the
12089    /// guard allows it to fire — only swaps when the alternate match
12090    /// is strictly longer than the depth-3 best, so the depth-4
12091    /// partition uses at most as many tokens as the depth-3 partition.
12092    /// When the guard suppresses the probe, depth-4 produces the same
12093    /// tokens as depth-3 directly. The test also guards against
12094    /// off-by-one in the new depth-4 insert/skip dedup (where `pos`,
12095    /// `pos + 1`, `pos + 2`, and `pos + 3` can all be pre-inserted
12096    /// before the chosen match starts at any of those positions or
12097    /// `pos + 4`).
12098    #[test]
12099    fn round_163_depth4_never_increases_token_count_over_depth3() {
12100        let shapes: &[(u32, u32)] = &[
12101            (16, 16),
12102            (20, 20),
12103            (24, 24),
12104            (32, 32),
12105            (48, 48),
12106            (16, 32),
12107            (64, 16),
12108            (40, 24),
12109        ];
12110        for &(w, h) in shapes {
12111            let gradient: Vec<u32> = (0..(w * h) as usize)
12112                .map(|i| {
12113                    let x = (i as u32) % w;
12114                    let y = (i as u32) / w;
12115                    let g = (x + y) & 0xFF;
12116                    0xFF00_0000 | (g << 16) | (g << 8) | g
12117                })
12118                .collect();
12119            let mut seed = 0xBADD_CAFE_u32;
12120            let noise: Vec<u32> = (0..(w * h) as usize)
12121                .map(|_| {
12122                    seed ^= seed << 13;
12123                    seed ^= seed >> 17;
12124                    seed ^= seed << 5;
12125                    0xFF00_0000 | (seed & 0x00FF_FFFF)
12126                })
12127                .collect();
12128            let stripes: Vec<u32> = (0..(w * h) as usize)
12129                .map(|i| {
12130                    let x = (i as u32) % w;
12131                    match x % 4 {
12132                        0 => 0xFFAA_5500,
12133                        1 => 0xFF55_AA00,
12134                        2 => 0xFF00_55AA,
12135                        _ => 0xFF55_00AA,
12136                    }
12137                })
12138                .collect();
12139
12140            for (name, pixels) in [
12141                ("gradient", &gradient),
12142                ("noise", &noise),
12143                ("stripes", &stripes),
12144            ] {
12145                let lazy3 = tokenize_lz77_inner(pixels, 3);
12146                let lazy4 = tokenize_lz77_inner(pixels, 4);
12147                assert!(
12148                    lazy4.len() <= lazy3.len(),
12149                    "round-163 depth-4 regression on {name} {w}x{h}: \
12150                     depth-3={} tokens, depth-4={} tokens",
12151                    lazy3.len(),
12152                    lazy4.len(),
12153                );
12154                // Round-trip the depth-4 stream as a defensive check
12155                // for hash-chain insert bookkeeping.
12156                let stream = encode_argb_literals_with_width(pixels, w);
12157                let header = build_image_header(w, h, true);
12158                let mut payload = header.to_vec();
12159                payload.extend_from_slice(&stream);
12160                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12161                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12162                assert_eq!(
12163                    img.pixels(),
12164                    pixels.as_slice(),
12165                    "round-163 depth-4 round-trip mismatch on {name} {w}x{h}"
12166                );
12167            }
12168        }
12169    }
12170
12171    // ---- round 159: §4.1 entropy-image-aware tie-break ----
12172
12173    /// `pick_block_mode_with_hint` accepts the preferred neighbour
12174    /// mode when it ties with the otherwise-lowest mode at the same
12175    /// minimal residual cost. The block is a solid-colour fill, so
12176    /// modes 1..=13 all predict the left/top neighbour exactly →
12177    /// every interior pixel has zero residual, and ties run across
12178    /// every mode whose residual sum equals the lowest sum found.
12179    /// Without a hint the chooser picks the lowest mode (mode 1 on a
12180    /// non-black solid); with a hint of `Some(7)` it returns mode 7.
12181    #[test]
12182    fn round_159_pick_block_mode_with_hint_swaps_on_tie() {
12183        let w = 8usize;
12184        let h = 8usize;
12185        let pixels = vec![0xff50_6070u32; w * h];
12186
12187        // No hint: the lowest tied mode wins (deterministic baseline).
12188        let baseline = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, None);
12189        // The exact value depends on the border rule for mode 0 vs
12190        // the per-channel residual; what matters here is that the
12191        // hint can swap to a different mode that ties at the same
12192        // cost.
12193        let baseline_cost = block_mode_cost(&pixels, w, h, 0, 0, w, h, baseline);
12194
12195        // Probe every mode 0..=13 to find one that ties baseline but
12196        // is not equal to it.
12197        let mut tied_other: Option<u8> = None;
12198        for m in 0u8..=13 {
12199            if m == baseline {
12200                continue;
12201            }
12202            let c = block_mode_cost(&pixels, w, h, 0, 0, w, h, m);
12203            if c == baseline_cost {
12204                tied_other = Some(m);
12205                break;
12206            }
12207        }
12208        let other = tied_other
12209            .expect("a solid-fill block has at least two modes tied at minimal residual cost");
12210
12211        // With hint == Some(other) and `other` strictly distinct
12212        // from `baseline` but tied at the same cost, the chooser
12213        // must return `other`.
12214        let with_hint = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, Some(other));
12215        assert_eq!(
12216            with_hint, other,
12217            "round-159 tie-break did not adopt the preferred mode: \
12218             baseline={baseline}, other={other}, returned={with_hint}"
12219        );
12220    }
12221
12222    /// `pick_block_mode_with_hint` does NOT swap when the preferred
12223    /// mode is strictly worse than the cost-minimal mode. A diagonal
12224    /// 2-D ramp `pixels[y, x] = (x + 2y) & 0xff` makes the L-based
12225    /// modes pay residual `1` per pixel while the T-based modes pay
12226    /// residual `2` per pixel, so the chooser picks an L-based mode
12227    /// uniquely. Probing every mode confirms which one is strictly
12228    /// worse than the picked baseline; with that mode as the hint
12229    /// the chooser must still return the baseline.
12230    #[test]
12231    fn round_159_pick_block_mode_with_hint_keeps_best_when_hint_worse() {
12232        let w = 16usize;
12233        let h = 16usize;
12234        // 2-D ramp: L-based modes pay 1/pixel; T-based modes pay 2/pixel.
12235        let pixels: Vec<u32> = (0..(w * h))
12236            .map(|i| {
12237                let x = (i % w) as u32;
12238                let y = (i / w) as u32;
12239                let v = (x + 2 * y) & 0xff;
12240                0xff00_0000 | (v << 16) | (v << 8) | v
12241            })
12242            .collect();
12243
12244        let baseline = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, None);
12245        let baseline_cost = block_mode_cost(&pixels, w, h, 0, 0, w, h, baseline);
12246        // Find any mode whose cost is strictly worse than baseline.
12247        let mut worse: Option<u8> = None;
12248        for m in 0u8..=13 {
12249            let c = block_mode_cost(&pixels, w, h, 0, 0, w, h, m);
12250            if c > baseline_cost {
12251                worse = Some(m);
12252                break;
12253            }
12254        }
12255        let worse = worse
12256            .expect("test premise: the 2-D ramp should produce at least one strictly-worse mode");
12257        let with_hint = pick_block_mode_with_hint(&pixels, w, h, 0, 0, w, h, Some(worse));
12258        assert_eq!(
12259            with_hint, baseline,
12260            "round-159 tie-break must not adopt a strictly-worse hint \
12261             (baseline={baseline}, worse-hint={worse})"
12262        );
12263    }
12264
12265    /// Local pre-round-159 copy of `build_predictor_image`. Mirrors
12266    /// the round-158 behaviour exactly: every block calls the
12267    /// hint-aware chooser with `prefer_mode = None`, so ties resolve
12268    /// to the lowest mode regardless of any spatial coherence. Used
12269    /// by the round-159 non-regression and strict-beat tests as the
12270    /// before-after baseline.
12271    fn pre_round_159_build_predictor_image(
12272        pixels: &[u32],
12273        width: u32,
12274        height: u32,
12275        size_bits: u8,
12276    ) -> (Vec<u32>, u32, u32) {
12277        let block = 1u32 << size_bits;
12278        let tw = predictor_div_round_up(width, block);
12279        let th = predictor_div_round_up(height, block);
12280        let mut img = Vec::with_capacity((tw * th) as usize);
12281        let w = width as usize;
12282        let h = height as usize;
12283        let bsz = block as usize;
12284        for by in 0..th as usize {
12285            for bx in 0..tw as usize {
12286                let x0 = bx * bsz;
12287                let y0 = by * bsz;
12288                let mode = pick_block_mode_with_hint(pixels, w, h, x0, y0, bsz, bsz, None);
12289                img.push(0xff00_0000 | ((mode as u32) << 8));
12290            }
12291        }
12292        (img, tw, th)
12293    }
12294
12295    /// Round 159 structural correctness: the entropy-image-aware
12296    /// tie-break is residual-cost-neutral, so for *every* block the
12297    /// post-r159 chosen mode has identical residual cost to the
12298    /// pre-r159 chosen mode (only the mode *value* may differ on
12299    /// ties). The check is per-block: across a fixture matrix the
12300    /// summed per-block residual cost must be exactly equal under
12301    /// the two choosers.
12302    #[test]
12303    fn round_159_predictor_image_tie_break_is_cost_neutral() {
12304        let shapes: &[(u32, u32, u8)] = &[
12305            (32, 32, 4),
12306            (48, 48, 4),
12307            (64, 32, 4),
12308            (32, 64, 4),
12309            (24, 24, 3),
12310        ];
12311        for &(w, h, size_bits) in shapes {
12312            // Two fixtures: smooth gradient (many ties on flat regions
12313            // between modes 1/2/3 etc.) and palette-ish stripes
12314            // (column-aligned ties between L-based modes).
12315            let gradient: Vec<u32> = (0..(w * h) as usize)
12316                .map(|i| {
12317                    let x = (i as u32) % w;
12318                    let y = (i as u32) / w;
12319                    let g = (x + y) & 0x0F;
12320                    0xFF00_0000 | (g << 16) | (g << 8) | g
12321                })
12322                .collect();
12323            let stripes: Vec<u32> = (0..(w * h) as usize)
12324                .map(|i| {
12325                    let x = (i as u32) % w;
12326                    match x % 4 {
12327                        0 => 0xFFAA_5500,
12328                        1 => 0xFF55_AA00,
12329                        2 => 0xFF00_55AA,
12330                        _ => 0xFF55_00AA,
12331                    }
12332                })
12333                .collect();
12334
12335            for (name, pixels) in [("gradient", &gradient), ("stripes", &stripes)] {
12336                let (pre_img, _, _) = pre_round_159_build_predictor_image(pixels, w, h, size_bits);
12337                let (post_img, _, _) = build_predictor_image(pixels, w, h, size_bits);
12338                assert_eq!(
12339                    pre_img.len(),
12340                    post_img.len(),
12341                    "pre/post mode-image length differs on {name} {w}x{h} size_bits={size_bits}"
12342                );
12343                let block = 1u32 << size_bits;
12344                let tw = predictor_div_round_up(w, block) as usize;
12345                let bsz = block as usize;
12346                let wu = w as usize;
12347                let hu = h as usize;
12348                for (idx, (pre_px, post_px)) in pre_img.iter().zip(post_img.iter()).enumerate() {
12349                    let bx = idx % tw;
12350                    let by = idx / tw;
12351                    let x0 = bx * bsz;
12352                    let y0 = by * bsz;
12353                    let pre_mode = ((pre_px >> 8) & 0xff) as u8;
12354                    let post_mode = ((post_px >> 8) & 0xff) as u8;
12355                    let pre_cost = block_mode_cost(pixels, wu, hu, x0, y0, bsz, bsz, pre_mode);
12356                    let post_cost = block_mode_cost(pixels, wu, hu, x0, y0, bsz, bsz, post_mode);
12357                    assert_eq!(
12358                        pre_cost, post_cost,
12359                        "round-159 tie-break changed residual cost on {name} {w}x{h} \
12360                         block=({bx},{by}): pre mode {pre_mode} cost {pre_cost}, \
12361                         post mode {post_mode} cost {post_cost}"
12362                    );
12363                }
12364            }
12365        }
12366    }
12367
12368    /// Round 159 non-regression: across a fixture matrix the
12369    /// post-r159 predictor-chooser stream must never be longer than
12370    /// the pre-r159 stream. Since the tie-break is a strict subset
12371    /// of the pre-r159 candidate space (the chosen mode is always a
12372    /// cost-minimal mode under both choosers), the residual stream
12373    /// is identical and only the predictor sub-image's entropy can
12374    /// differ. The standalone chooser is invoked end-to-end through
12375    /// the lossless decoder to confirm round-trips on every fixture.
12376    #[test]
12377    fn round_159_predictor_chooser_never_regresses() {
12378        let shapes: &[(u32, u32)] = &[(16, 16), (24, 24), (32, 32), (48, 48), (32, 16), (24, 40)];
12379        for &(w, h) in shapes {
12380            let gradient: Vec<u32> = (0..(w * h) as usize)
12381                .map(|i| {
12382                    let x = (i as u32) % w;
12383                    let y = (i as u32) / w;
12384                    let g = (x + y) & 0x0F;
12385                    0xFF00_0000 | (g << 16) | (g << 8) | g
12386                })
12387                .collect();
12388            let stripes: Vec<u32> = (0..(w * h) as usize)
12389                .map(|i| {
12390                    let x = (i as u32) % w;
12391                    match x % 4 {
12392                        0 => 0xFFAA_5500,
12393                        1 => 0xFF55_AA00,
12394                        2 => 0xFF00_55AA,
12395                        _ => 0xFF55_00AA,
12396                    }
12397                })
12398                .collect();
12399            let mut seed = 0xDEAD_BEEFu32;
12400            let noise: Vec<u32> = (0..(w * h) as usize)
12401                .map(|_| {
12402                    seed ^= seed << 13;
12403                    seed ^= seed >> 17;
12404                    seed ^= seed << 5;
12405                    0xFF00_0000 | (seed & 0x000F_0F0F)
12406                })
12407                .collect();
12408
12409            for (name, pixels) in [
12410                ("gradient", &gradient),
12411                ("stripes", &stripes),
12412                ("low-noise", &noise),
12413            ] {
12414                // Encode under the production chooser (with r159 tie-break).
12415                let post = encode_argb_with_predictor_chooser(pixels, w, h);
12416                // Decode round-trip — strict invariant.
12417                let header = build_image_header(w, h, true);
12418                let mut payload = header.to_vec();
12419                payload.extend_from_slice(&post);
12420                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12421                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12422                assert_eq!(
12423                    img.pixels(),
12424                    pixels.as_slice(),
12425                    "round-159 round-trip mismatch on {name} {w}x{h}"
12426                );
12427                // Non-regression: the chooser's output with the
12428                // r159 hint must be no larger than the chooser with
12429                // the hint stubbed out. Since the hint is a strict
12430                // tie-break (same residual cost), the residual
12431                // stream is identical; only the predictor sub-image
12432                // can change, and it changes in the entropy-
12433                // reducing direction (so the writer emits fewer
12434                // bytes for it).
12435                let pre = encode_argb_with_predictor_chooser_no_r159_hint(pixels, w, h);
12436                assert!(
12437                    post.len() <= pre.len(),
12438                    "round-159 chooser regressed on {name} {w}x{h}: \
12439                     pre={} B post={} B",
12440                    pre.len(),
12441                    post.len(),
12442                );
12443            }
12444        }
12445    }
12446
12447    /// Round 159 structural strict-beat: across a sweep of
12448    /// perturbation seeds, at least one fixture must reach a
12449    /// strictly more-uniform predictor sub-image under the r159
12450    /// hint-aware chooser than under the no-hint baseline — i.e.
12451    /// the mode-image's distinct-mode count drops by at least 1.
12452    /// The sweep verifies the entropy-image-aware tie-break
12453    /// actually fires on realistic small fixtures and reports the
12454    /// byte delta in the §4.1 predictor candidate's output for the
12455    /// first such fixture.
12456    ///
12457    /// Operates on `encode_with_predictor` directly (vs the full
12458    /// chooser) so the savings aren't masked by a competing
12459    /// candidate winning the chooser.
12460    #[test]
12461    fn round_159_predictor_candidate_strictly_beats_no_hint_on_some_fixture() {
12462        let w = 48u32;
12463        let h = 48u32;
12464        let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12465        let mut found_strict_image = false;
12466        let mut found_strict_bytes = false;
12467        let mut best_savings: i64 = 0;
12468        let mut seed_winner: u32 = 0;
12469        for seed_init in [
12470            0xCAFE_BABEu32,
12471            0xC0FFEE00,
12472            0xDEAD_BEEF,
12473            0xFACE_F00D,
12474            0xFEED_F00D,
12475            0x1234_5678,
12476            0xABCD_1234,
12477            0x90AB_CDEF,
12478            0x5A5A_5A5A,
12479            0xA5A5_A5A5,
12480            0xBA5E_BA11,
12481            0xB16B_00B5,
12482        ] {
12483            // Solid-fill canvas with a small perturbed region.
12484            // Vary the perturbation extent so different fixtures
12485            // trigger different mode-image patterns.
12486            let solid = 0xff60_8050u32;
12487            let mut pixels = vec![solid; (w * h) as usize];
12488            let mut s = seed_init;
12489            // 8×8 perturbation in the top-left so the right /
12490            // bottom neighbours' left-/top-column reads stay
12491            // mostly on solid pixels.
12492            for y in 0..8u32 {
12493                for x in 0..8u32 {
12494                    s ^= s << 13;
12495                    s ^= s >> 17;
12496                    s ^= s << 5;
12497                    let v = (s & 0x0007_0707) | 0xFF00_0000;
12498                    pixels[(y * w + x) as usize] = v;
12499                }
12500            }
12501            let (pre_img, _, _) = pre_round_159_build_predictor_image(&pixels, w, h, size_bits);
12502            let (post_img, _, _) = build_predictor_image(&pixels, w, h, size_bits);
12503            let pre_modes: Vec<u8> = pre_img.iter().map(|p| ((p >> 8) & 0xff) as u8).collect();
12504            let post_modes: Vec<u8> = post_img.iter().map(|p| ((p >> 8) & 0xff) as u8).collect();
12505            let pre_distinct: std::collections::BTreeSet<u8> = pre_modes.iter().copied().collect();
12506            let post_distinct: std::collections::BTreeSet<u8> =
12507                post_modes.iter().copied().collect();
12508            if post_distinct.len() < pre_distinct.len() {
12509                found_strict_image = true;
12510                // Encode the predictor candidate under both
12511                // variants and check the byte delta.
12512                let post = encode_with_predictor(&pixels, w, h, size_bits, None, w);
12513                let pre = encode_with_predictor_no_r159_hint(&pixels, w, h, size_bits, None, w);
12514                let saved = pre.len() as i64 - post.len() as i64;
12515                if saved > best_savings {
12516                    best_savings = saved;
12517                    seed_winner = seed_init;
12518                }
12519                if post.len() < pre.len() {
12520                    found_strict_bytes = true;
12521                    // Round-trip the post stream end-to-end.
12522                    let header = build_image_header(w, h, true);
12523                    let mut payload = header.to_vec();
12524                    payload.extend_from_slice(&post);
12525                    let framed =
12526                        build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12527                    let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12528                    assert_eq!(
12529                        img.pixels(),
12530                        pixels.as_slice(),
12531                        "round-159 strict-beat predictor candidate round-trip mismatch on \
12532                         seed=0x{seed_init:08x}"
12533                    );
12534                    eprintln!(
12535                        "[round-159] strict-beat predictor candidate: seed=0x{seed_init:08x}, \
12536                         pre modes={pre_modes:?} post modes={post_modes:?} (distinct \
12537                         pre={} post={}), pre={} B post={} B, saved={saved} B",
12538                        pre_distinct.len(),
12539                        post_distinct.len(),
12540                        pre.len(),
12541                        post.len(),
12542                    );
12543                }
12544                // Non-regression always holds (residual cost is the
12545                // same under the tie-break, so the encoded bytes
12546                // can never increase).
12547                assert!(
12548                    post.len() <= pre.len(),
12549                    "round-159 tie-break regressed on seed=0x{seed_init:08x}: \
12550                     pre={} B post={} B",
12551                    pre.len(),
12552                    post.len(),
12553                );
12554            }
12555        }
12556        assert!(
12557            found_strict_image,
12558            "round-159 sweep did not produce a single strictly-more-uniform mode image \
12559             — the hint propagation never fired across the fixture set"
12560        );
12561        assert!(
12562            found_strict_bytes,
12563            "round-159 sweep found a strict mode-image reduction but never a strict byte \
12564             reduction; entropy savings stayed within the LSB packing slack \
12565             (best_savings={best_savings} on seed=0x{seed_winner:08x})"
12566        );
12567    }
12568
12569    /// Local pre-round-159 copy of `encode_argb_with_predictor_chooser`
12570    /// that forces every predictor-image build to use the no-hint
12571    /// chooser. Used by `round_159_predictor_chooser_never_regresses`
12572    /// as the before-after baseline. The chooser's other candidate
12573    /// paths (no-tx, subtract-green, color-transform, color-indexing,
12574    /// meta-prefix) are re-used verbatim — only the predictor
12575    /// candidate is swapped for the no-hint variant.
12576    fn encode_argb_with_predictor_chooser_no_r159_hint(
12577        pixels: &[u32],
12578        width: u32,
12579        height: u32,
12580    ) -> Vec<u8> {
12581        let mut best = encode_argb_literals_with_width(pixels, width);
12582
12583        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12584        let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
12585        let pred_block = 1u32 << pred_size_bits;
12586        let ctx_block = 1u32 << ctx_size_bits;
12587
12588        if width >= pred_block && height >= pred_block {
12589            let mut pred_single_block_size_bits: u8 = pred_size_bits;
12590            while pred_single_block_size_bits < 9
12591                && ((1u32 << pred_single_block_size_bits) < width
12592                    || (1u32 << pred_single_block_size_bits) < height)
12593            {
12594                pred_single_block_size_bits += 1;
12595            }
12596            let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
12597            let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
12598                encode_with_predictor_no_r159_hint(
12599                    pixels,
12600                    width,
12601                    height,
12602                    pred_size_bits,
12603                    cache_bits,
12604                    width,
12605                )
12606            })];
12607            if try_pred_single_block {
12608                pred_candidates.push(select_best_cache_bits(|cache_bits| {
12609                    encode_with_predictor_no_r159_hint(
12610                        pixels,
12611                        width,
12612                        height,
12613                        pred_single_block_size_bits,
12614                        cache_bits,
12615                        width,
12616                    )
12617                }));
12618            }
12619            for cand in pred_candidates {
12620                if cand.len() < best.len() {
12621                    best = cand;
12622                }
12623            }
12624        }
12625
12626        if width >= ctx_block && height >= ctx_block {
12627            let mut single_block_size_bits: u8 = ctx_size_bits;
12628            while single_block_size_bits < 9
12629                && ((1u32 << single_block_size_bits) < width
12630                    || (1u32 << single_block_size_bits) < height)
12631            {
12632                single_block_size_bits += 1;
12633            }
12634            let try_single_block = single_block_size_bits != ctx_size_bits;
12635            let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
12636                encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
12637            })];
12638            if try_single_block {
12639                candidates.push(select_best_cache_bits(|cache_bits| {
12640                    encode_with_color_transform(
12641                        pixels,
12642                        width,
12643                        height,
12644                        single_block_size_bits,
12645                        cache_bits,
12646                        width,
12647                    )
12648                }));
12649            }
12650            for cand in candidates {
12651                if cand.len() < best.len() {
12652                    best = cand;
12653                }
12654            }
12655        }
12656
12657        if collect_palette(pixels).is_some() {
12658            let ci_best = select_best_cache_bits(|cache_bits| {
12659                encode_with_color_indexing(pixels, width, height, cache_bits)
12660                    .expect("palette feasibility already confirmed")
12661            });
12662            if ci_best.len() < best.len() {
12663                best = ci_best;
12664            }
12665        }
12666
12667        if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
12668            if mp_best.len() < best.len() {
12669                best = mp_best;
12670            }
12671        }
12672
12673        best
12674    }
12675
12676    /// Local pre-round-159 copy of `encode_with_predictor` — same
12677    /// shape, but builds the predictor sub-image via the no-hint
12678    /// chooser (`pre_round_159_build_predictor_image`) so the
12679    /// before-after comparison isolates exactly the round-159
12680    /// tie-break change.
12681    fn encode_with_predictor_no_r159_hint(
12682        pixels: &[u32],
12683        width: u32,
12684        height: u32,
12685        size_bits: u8,
12686        cache_code_bits: Option<u32>,
12687        image_width: u32,
12688    ) -> Vec<u8> {
12689        let mut w = BitWriter::new();
12690        w.write_bit(true);
12691        w.write_bits(crate::vp8l_stream::TransformType::Predictor as u32, 2);
12692        debug_assert!((2..=9).contains(&size_bits));
12693        w.write_bits((size_bits - 2) as u32, 3);
12694        let (predictor_image, tw, _th) =
12695            pre_round_159_build_predictor_image(pixels, width, height, size_bits);
12696        write_entropy_coded_image_literals(&mut w, &predictor_image);
12697        w.write_bit(false);
12698        let mut residuals = vec![0u32; pixels.len()];
12699        apply_forward_predictor(
12700            pixels,
12701            &mut residuals,
12702            width,
12703            height,
12704            &predictor_image,
12705            tw,
12706            size_bits,
12707        );
12708        let mut tokens = tokenize_lz77(&residuals);
12709        if let Some(bits) = cache_code_bits {
12710            tokens = cacheify_tokens(&tokens, &residuals, bits);
12711        }
12712        write_spatially_coded_image(&mut w, &tokens, cache_code_bits, image_width);
12713        w.into_bytes()
12714    }
12715
12716    // ---- round-160 §4.1 slack-cost tie-break tests ------------------
12717
12718    /// Round 160 hint-aware chooser contract (slack form): given a
12719    /// preferred mode whose residual cost is **within `slack`** of
12720    /// the otherwise-best cost, the chooser returns the preferred
12721    /// mode rather than the lowest-tied (or lowest-best) mode.
12722    /// Constructs a small 4×4 block with carefully-chosen
12723    /// per-channel values such that the lowest-best mode is 0
12724    /// (Black) but a non-trivial L-based mode has cost only one
12725    /// magnitude unit higher; the slack=1 chooser must select the
12726    /// preferred mode.
12727    #[test]
12728    fn round_160_pick_block_mode_with_hint_slack_swaps_within_budget() {
12729        // Solid-fill 4×4: every mode 1..=13 ties at zero residual
12730        // cost across the block interior; mode 0 (Black) gives a
12731        // strictly larger cost (the solid color is far from black).
12732        // The slack-cost chooser with `prefer = Some(7)` and slack
12733        // >= 0 must select mode 7 (the preferred tied mode), and
12734        // the strict-tie chooser must agree.
12735        let solid = 0xff60_8050u32;
12736        let pixels: Vec<u32> = vec![solid; 16];
12737        let strict = pick_block_mode_with_hint(&pixels, 4, 4, 0, 0, 4, 4, Some(7));
12738        let slack0 = pick_block_mode_with_hint_slack(&pixels, 4, 4, 0, 0, 4, 4, Some(7), 0);
12739        assert_eq!(
12740            strict, slack0,
12741            "slack=0 must be byte-identical to the round-159 strict tie-break"
12742        );
12743        assert_eq!(
12744            slack0, 7,
12745            "preferred tied mode must win on slack=0 when cost is equal"
12746        );
12747
12748        // Now construct a block where mode 0 has cost 0 (strictly
12749        // best) and another mode has small positive cost. The slack
12750        // chooser at sufficiently-large slack must swap to the
12751        // preferred mode; at slack=0 it must keep mode 0.
12752        //
12753        // Choose a 2×2 block of solid black (all zeros). The Black
12754        // predictor returns 0 (matches), and every other mode that
12755        // predicts from a neighbour also returns 0 (neighbours are
12756        // solid black). So *every* mode has cost 0 — not the
12757        // shape we want.
12758        //
12759        // Instead, place the test block inside a larger fixture so
12760        // that the block's *neighbour* pixels (above/left) differ
12761        // and force the L/T/etc. modes to non-zero cost while
12762        // Black mode stays at 0.
12763        //
12764        // 8×8 fixture: top half black, bottom half a non-zero
12765        // colour. Place the test block at (0, 4) — the row of
12766        // pixels above is the boundary between black (y=3) and
12767        // colour (y=4), so the T mode reads the row-3 black pixels
12768        // while the block itself is non-zero → T mode has non-zero
12769        // cost. The Black mode is `pred = 0` everywhere → cost is
12770        // the sum-magnitudes of the block's non-zero pixels.
12771        let mut big = vec![0xff00_0000u32; 64];
12772        for y in 4..8u32 {
12773            for x in 0..8u32 {
12774                big[(y * 8 + x) as usize] = 0xff01_0101;
12775            }
12776        }
12777        let best_default = pick_block_mode_with_hint(&big, 8, 8, 0, 4, 4, 4, None);
12778        let best_cost = block_mode_cost(&big, 8, 8, 0, 4, 4, 4, best_default);
12779
12780        // Pick a non-best mode and find its cost.
12781        let mut preferred: u8 = u8::MAX;
12782        let mut pref_cost: u64 = u64::MAX;
12783        for m in 0u8..=13 {
12784            if m == best_default {
12785                continue;
12786            }
12787            let c = block_mode_cost(&big, 8, 8, 0, 4, 4, 4, m);
12788            if c > best_cost && c < pref_cost {
12789                preferred = m;
12790                pref_cost = c;
12791            }
12792        }
12793        if preferred != u8::MAX {
12794            let extra = pref_cost - best_cost;
12795            // Strict tie-break must keep the best mode (cost
12796            // mismatch).
12797            let strict = pick_block_mode_with_hint(&big, 8, 8, 0, 4, 4, 4, Some(preferred));
12798            assert_eq!(
12799                strict, best_default,
12800                "strict round-159 tie-break must NOT swap when costs differ"
12801            );
12802            // Slack = extra - 1 must also keep the best mode.
12803            if extra > 0 {
12804                let slack_too_small = pick_block_mode_with_hint_slack(
12805                    &big,
12806                    8,
12807                    8,
12808                    0,
12809                    4,
12810                    4,
12811                    4,
12812                    Some(preferred),
12813                    extra - 1,
12814                );
12815                assert_eq!(
12816                    slack_too_small, best_default,
12817                    "slack < (pref_cost - best_cost) must NOT swap"
12818                );
12819            }
12820            // Slack = extra must now allow the swap.
12821            let slack_exact =
12822                pick_block_mode_with_hint_slack(&big, 8, 8, 0, 4, 4, 4, Some(preferred), extra);
12823            assert_eq!(
12824                slack_exact, preferred,
12825                "slack >= (pref_cost - best_cost) must accept the preferred mode swap"
12826            );
12827        }
12828    }
12829
12830    /// Round 160 strict round-159 equivalence: with `slack = 0` the
12831    /// slack-cost chooser must produce byte-identical predictor
12832    /// sub-images and byte-identical encoded streams to the
12833    /// round-159 strict-tie-break baseline, across a fixture
12834    /// matrix.
12835    #[test]
12836    fn round_160_slack_zero_matches_round_159_baseline() {
12837        let shapes: &[(u32, u32, u8)] = &[
12838            (32, 32, 4),
12839            (48, 48, 4),
12840            (64, 32, 4),
12841            (32, 64, 4),
12842            (24, 24, 3),
12843        ];
12844        for &(w, h, size_bits) in shapes {
12845            let gradient: Vec<u32> = (0..(w * h) as usize)
12846                .map(|i| {
12847                    let x = (i as u32) % w;
12848                    let y = (i as u32) / w;
12849                    let g = (x + y) & 0x0F;
12850                    0xFF00_0000 | (g << 16) | (g << 8) | g
12851                })
12852                .collect();
12853            let stripes: Vec<u32> = (0..(w * h) as usize)
12854                .map(|i| {
12855                    let x = (i as u32) % w;
12856                    match x % 4 {
12857                        0 => 0xFFAA_5500,
12858                        1 => 0xFF55_AA00,
12859                        2 => 0xFF00_55AA,
12860                        _ => 0xFF55_00AA,
12861                    }
12862                })
12863                .collect();
12864
12865            for (name, pixels) in [("gradient", &gradient), ("stripes", &stripes)] {
12866                let (r159_img, _, _) = build_predictor_image(pixels, w, h, size_bits);
12867                let (r160_img, _, _) = build_predictor_image_with_slack(pixels, w, h, size_bits, 0);
12868                assert_eq!(
12869                    r159_img, r160_img,
12870                    "slack=0 sub-image must equal r159 baseline on {name} {w}x{h} \
12871                     size_bits={size_bits}"
12872                );
12873                let r159_bytes = encode_with_predictor(pixels, w, h, size_bits, None, w);
12874                let r160_bytes = encode_with_predictor_slack(pixels, w, h, size_bits, None, w, 0);
12875                assert_eq!(
12876                    r159_bytes, r160_bytes,
12877                    "slack=0 encoded bytes must equal r159 baseline on {name} {w}x{h} \
12878                     size_bits={size_bits}"
12879                );
12880            }
12881        }
12882    }
12883
12884    /// Round 160 round-trip correctness: at any slack budget, the
12885    /// slack-cost predictor candidate produces a stream that, when
12886    /// framed and decoded, reproduces the input pixels exactly. The
12887    /// per-block chosen mode changes with slack but the forward
12888    /// transform always derives residuals from the chosen modes and
12889    /// the decoder re-derives the same modes from the sub-image.
12890    #[test]
12891    fn round_160_slack_predictor_round_trips_through_decoder() {
12892        let w = 32u32;
12893        let h = 32u32;
12894        let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
12895        let pixels: Vec<u32> = (0..(w * h) as usize)
12896            .map(|i| {
12897                let x = (i as u32) % w;
12898                let y = (i as u32) / w;
12899                let r = (x * 7) & 0xff;
12900                let g = (y * 11) & 0xff;
12901                let b = ((x ^ y) * 3) & 0xff;
12902                0xFF00_0000 | (r << 16) | (g << 8) | b
12903            })
12904            .collect();
12905        let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
12906        for slack in [0, block_pixels, 2 * block_pixels, 8 * block_pixels] {
12907            let stream = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
12908            let header = build_image_header(w, h, true);
12909            let mut payload = header.to_vec();
12910            payload.extend_from_slice(&stream);
12911            let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12912            let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12913            assert_eq!(
12914                img.pixels(),
12915                pixels.as_slice(),
12916                "round-160 slack={slack} predictor candidate failed end-to-end round-trip"
12917            );
12918        }
12919    }
12920
12921    /// Round 160 non-regression: across a fixture matrix the
12922    /// production `encode_argb_with_predictor_chooser` output is
12923    /// `<=` the chooser's output with slack candidates disabled
12924    /// (i.e. the round-159 chooser). The new slack candidates can
12925    /// only *add* options to the byte-best selection, so they must
12926    /// never increase the chosen output length.
12927    #[test]
12928    fn round_160_chooser_never_regresses_vs_round_159() {
12929        let shapes: &[(u32, u32)] = &[(32, 32), (48, 48), (32, 64), (64, 32), (24, 24)];
12930        for &(w, h) in shapes {
12931            // Three fixtures: smooth gradient, palette stripes, and
12932            // a sparse noise image (low predictor residual mass for
12933            // a few mode-image blocks, high for others — exactly
12934            // the regime where the slack tie-break can pay off).
12935            let gradient: Vec<u32> = (0..(w * h) as usize)
12936                .map(|i| {
12937                    let x = (i as u32) % w;
12938                    let y = (i as u32) / w;
12939                    let g = (x + y) & 0x0F;
12940                    0xFF00_0000 | (g << 16) | (g << 8) | g
12941                })
12942                .collect();
12943            let stripes: Vec<u32> = (0..(w * h) as usize)
12944                .map(|i| {
12945                    let x = (i as u32) % w;
12946                    match x % 4 {
12947                        0 => 0xFFAA_5500,
12948                        1 => 0xFF55_AA00,
12949                        2 => 0xFF00_55AA,
12950                        _ => 0xFF55_00AA,
12951                    }
12952                })
12953                .collect();
12954            let mut s: u32 = 0xCAFE_BABE;
12955            let noise: Vec<u32> = (0..(w * h) as usize)
12956                .map(|_| {
12957                    s ^= s << 13;
12958                    s ^= s >> 17;
12959                    s ^= s << 5;
12960                    0xFF00_0000 | (s & 0x00FF_FFFF)
12961                })
12962                .collect();
12963
12964            for (name, pixels) in [
12965                ("gradient", &gradient),
12966                ("stripes", &stripes),
12967                ("noise", &noise),
12968            ] {
12969                let r159 = encode_argb_with_predictor_chooser_no_r160_slack(pixels, w, h);
12970                let r160 = encode_argb_with_predictor_chooser(pixels, w, h);
12971                assert!(
12972                    r160.len() <= r159.len(),
12973                    "round-160 chooser regressed on {name} {w}x{h}: r159={} B r160={} B",
12974                    r159.len(),
12975                    r160.len()
12976                );
12977                // End-to-end round-trip parity on the r160 stream.
12978                let header = build_image_header(w, h, true);
12979                let mut payload = header.to_vec();
12980                payload.extend_from_slice(&r160);
12981                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
12982                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
12983                assert_eq!(
12984                    img.pixels(),
12985                    pixels.as_slice(),
12986                    "round-160 chooser output failed end-to-end round-trip on \
12987                     {name} {w}x{h}"
12988                );
12989            }
12990        }
12991    }
12992
12993    /// Round 160 headline: the slack-cost **predictor candidate**
12994    /// strictly beats the round-159 strict-tie-break predictor
12995    /// candidate on at least one fixture, with the seed, slack
12996    /// budget, and byte savings printed for the round report.
12997    ///
12998    /// The comparison is between the two predictor candidates in
12999    /// isolation, not between the overall chooser outputs: the
13000    /// production chooser composes the predictor candidate with
13001    /// every other transform path (no-tx, subtract-green, color-
13002    /// transform, color-indexing, multi-meta-prefix) and may pick a
13003    /// non-predictor path as best, so the chooser output won't
13004    /// always reflect the slack savings on the predictor candidate
13005    /// alone. The invariant we *prove* here is: on at least one
13006    /// fixture in the sweep, `encode_with_predictor_slack(..,
13007    /// slack > 0, ..)` produces a strictly shorter byte stream
13008    /// than `encode_with_predictor(.., slack = 0, ..)`, which is
13009    /// the byte-cost win the round-160 slack-cost variant is
13010    /// designed to capture. The full chooser also picks up the
13011    /// win whenever the predictor path ends up the byte-best
13012    /// overall.
13013    ///
13014    /// The fixtures are seeded perturbations of a mostly-uniform
13015    /// canvas: small perturbation patches plus a sparse single-
13016    /// pixel noise sprinkle. These are the layouts where the
13017    /// predictor sub-image carries a small number of "almost
13018    /// uniform" mode-image entries that the slack tie-break can
13019    /// collapse onto a single dominant mode at a small residual
13020    /// cost.
13021    #[test]
13022    fn round_160_slack_candidate_strictly_beats_strict_on_some_fixture() {
13023        let w = 128u32;
13024        let h = 128u32;
13025        let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13026        let mut found = false;
13027        let mut best_savings: i64 = 0;
13028        let mut seed_winner: u32 = 0;
13029        let mut slack_winner: u64 = 0;
13030        // Slack sweep: pick a spread of budgets between 1 residual
13031        // unit and 4× block_pixels. The diagnostic phase of round
13032        // 160 development showed that the productive regime starts
13033        // around slack ≥ block_pixels / 4 (16-pixel blocks → slack
13034        // ≥ 64) on the seeded fixtures used here.
13035        let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
13036        let slack_candidates: &[u64] = &[
13037            1,
13038            4,
13039            16,
13040            64,
13041            block_pixels,
13042            2 * block_pixels,
13043            4 * block_pixels,
13044        ];
13045        for seed_init in [
13046            0xCAFE_BABEu32,
13047            0xC0FFEE00,
13048            0xDEAD_BEEF,
13049            0xFACE_F00D,
13050            0xFEED_F00D,
13051            0x1234_5678,
13052            0xABCD_1234,
13053            0x90AB_CDEF,
13054            0x5A5A_5A5A,
13055            0xA5A5_A5A5,
13056            0xBA5E_BA11,
13057            0xB16B_00B5,
13058            0x00DD_BA11,
13059            0xC1AB_AB00,
13060            0xDEAF_BABE,
13061            0xCABB_A6E0,
13062            0x1337_C0DE,
13063            0xABAD_CAFE,
13064            0xBADF_00D0,
13065            0x8BAD_F00D,
13066        ] {
13067            // Mostly-solid canvas with a 1-bit-per-channel noise
13068            // overlay sprinkled at a sparse stride. The overlay is
13069            // small enough that the residual mass added per block
13070            // is in the order of `block_pixels` (matches our chooser
13071            // slack budget) but large enough to push the best-mode
13072            // choice off the all-zero tie in some blocks.
13073            let solid = 0xff60_8050u32;
13074            let mut pixels = vec![solid; (w * h) as usize];
13075            let mut s = seed_init;
13076            // Two perturbation patches of varying sizes to give the
13077            // chooser something to chew on without dominating the
13078            // whole image (the chooser must still see lots of tied
13079            // blocks for the slack tie-break to pay off).
13080            for y in 0..6u32 {
13081                for x in 0..6u32 {
13082                    s ^= s << 13;
13083                    s ^= s >> 17;
13084                    s ^= s << 5;
13085                    pixels[(y * w + x) as usize] = (s & 0x0003_0303) | 0xFF60_8050;
13086                }
13087            }
13088            for y in 20..30u32 {
13089                for x in 20..30u32 {
13090                    s ^= s << 13;
13091                    s ^= s >> 17;
13092                    s ^= s << 5;
13093                    pixels[(y * w + x) as usize] = (s & 0x0007_0707) | 0xFF60_8050;
13094                }
13095            }
13096            // Sparse single-pixel perturbations scattered across the
13097            // remaining canvas — these are the perturbations that
13098            // tend to push individual blocks just barely off the
13099            // best-mode tie, exposing the slack tie-break opportunity.
13100            for _ in 0..32u32 {
13101                s ^= s << 13;
13102                s ^= s >> 17;
13103                s ^= s << 5;
13104                let px = (s >> 8) % w;
13105                let py = (s >> 16) % h;
13106                pixels[(py * w + px) as usize] = (s & 0x0001_0101) | 0xFF60_8050;
13107            }
13108
13109            // Strict-tie-break baseline (round-159 chooser): the
13110            // slack = 0 predictor candidate at the default
13111            // size_bits. Cache-bits stays at None for a clean
13112            // comparison — the slack candidate is also tested at
13113            // cache_code_bits = None, isolating the effect to the
13114            // §4.1 forward transform.
13115            let strict_bytes = encode_with_predictor(&pixels, w, h, size_bits, None, w);
13116            // Slack sweep: pick the smallest slack-cost predictor
13117            // stream and compare against the strict baseline.
13118            let mut best_slack_bytes = strict_bytes.clone();
13119            let mut best_slack_value: u64 = 0;
13120            for &slack in slack_candidates {
13121                let bytes = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
13122                if bytes.len() < best_slack_bytes.len() {
13123                    best_slack_bytes = bytes;
13124                    best_slack_value = slack;
13125                }
13126            }
13127            if best_slack_bytes.len() < strict_bytes.len() {
13128                let saved = strict_bytes.len() as i64 - best_slack_bytes.len() as i64;
13129                if saved > best_savings {
13130                    best_savings = saved;
13131                    seed_winner = seed_init;
13132                    slack_winner = best_slack_value;
13133                }
13134                if !found {
13135                    found = true;
13136                }
13137                // Round-trip the winning slack stream end-to-end
13138                // through the full framed-WebP path to prove decode
13139                // correctness on the slack-tie-break-modified
13140                // residual stream.
13141                let header = build_image_header(w, h, true);
13142                let mut payload = header.to_vec();
13143                payload.extend_from_slice(&best_slack_bytes);
13144                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13145                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13146                assert_eq!(
13147                    img.pixels(),
13148                    pixels.as_slice(),
13149                    "round-160 strict-beat predictor candidate round-trip mismatch on \
13150                     seed=0x{seed_init:08x} slack={best_slack_value}"
13151                );
13152                eprintln!(
13153                    "[round-160] slack-cost strict-beat: seed=0x{seed_init:08x}, \
13154                     slack={best_slack_value}, strict={} B slack={} B saved={saved} B",
13155                    strict_bytes.len(),
13156                    best_slack_bytes.len(),
13157                );
13158            }
13159            // Production chooser non-regression: r160 chooser
13160            // (which evaluates both strict and slack predictor
13161            // candidates against every other transform path) is
13162            // always ≤ r159 chooser (which evaluates strict only).
13163            let r159 = encode_argb_with_predictor_chooser_no_r160_slack(&pixels, w, h);
13164            let r160 = encode_argb_with_predictor_chooser(&pixels, w, h);
13165            assert!(
13166                r160.len() <= r159.len(),
13167                "round-160 chooser regressed on seed 0x{seed_init:08x}: \
13168                 r159={} B r160={} B",
13169                r159.len(),
13170                r160.len()
13171            );
13172        }
13173        assert!(
13174            found,
13175            "round-160 slack-cost sweep did not produce a single strict byte reduction \
13176             across the seeded fixture set; the new slack candidates never won \
13177             (best_savings={best_savings} on seed=0x{seed_winner:08x} slack={slack_winner})"
13178        );
13179    }
13180
13181    /// Local pre-round-160 copy of `encode_argb_with_predictor_chooser`
13182    /// that omits the round-160 slack-cost predictor candidates. Used
13183    /// by the round-160 non-regression and strict-beat tests as the
13184    /// before-after baseline; the rest of the chooser (no-tx,
13185    /// subtract-green, color-transform, color-indexing, meta-prefix)
13186    /// is re-used verbatim.
13187    fn encode_argb_with_predictor_chooser_no_r160_slack(
13188        pixels: &[u32],
13189        width: u32,
13190        height: u32,
13191    ) -> Vec<u8> {
13192        let mut best = encode_argb_literals_with_width(pixels, width);
13193
13194        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13195        let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13196        let pred_block = 1u32 << pred_size_bits;
13197        let ctx_block = 1u32 << ctx_size_bits;
13198
13199        if width >= pred_block && height >= pred_block {
13200            let mut pred_single_block_size_bits: u8 = pred_size_bits;
13201            while pred_single_block_size_bits < 9
13202                && ((1u32 << pred_single_block_size_bits) < width
13203                    || (1u32 << pred_single_block_size_bits) < height)
13204            {
13205                pred_single_block_size_bits += 1;
13206            }
13207            let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13208            let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13209                encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13210            })];
13211            if try_pred_single_block {
13212                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13213                    encode_with_predictor(
13214                        pixels,
13215                        width,
13216                        height,
13217                        pred_single_block_size_bits,
13218                        cache_bits,
13219                        width,
13220                    )
13221                }));
13222            }
13223            for cand in pred_candidates {
13224                if cand.len() < best.len() {
13225                    best = cand;
13226                }
13227            }
13228        }
13229
13230        if width >= ctx_block && height >= ctx_block {
13231            let mut single_block_size_bits: u8 = ctx_size_bits;
13232            while single_block_size_bits < 9
13233                && ((1u32 << single_block_size_bits) < width
13234                    || (1u32 << single_block_size_bits) < height)
13235            {
13236                single_block_size_bits += 1;
13237            }
13238            let try_single_block = single_block_size_bits != ctx_size_bits;
13239            let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13240                encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
13241            })];
13242            if try_single_block {
13243                candidates.push(select_best_cache_bits(|cache_bits| {
13244                    encode_with_color_transform(
13245                        pixels,
13246                        width,
13247                        height,
13248                        single_block_size_bits,
13249                        cache_bits,
13250                        width,
13251                    )
13252                }));
13253            }
13254            for cand in candidates {
13255                if cand.len() < best.len() {
13256                    best = cand;
13257                }
13258            }
13259        }
13260
13261        if collect_palette(pixels).is_some() {
13262            let ci_best = select_best_cache_bits(|cache_bits| {
13263                encode_with_color_indexing(pixels, width, height, cache_bits)
13264                    .expect("palette feasibility already confirmed")
13265            });
13266            if ci_best.len() < best.len() {
13267                best = ci_best;
13268            }
13269        }
13270
13271        if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
13272            if mp_best.len() < best.len() {
13273                best = mp_best;
13274            }
13275        }
13276
13277        best
13278    }
13279
13280    // ---- Round 161 tests: Shannon-entropy bit-cost predictor variant -------
13281
13282    /// Local pre-round-161 copy of `encode_argb_with_predictor_chooser`
13283    /// that omits the round-161 entropy-cost predictor candidates but
13284    /// **keeps** every round-160 slack-cost candidate. Used by the
13285    /// round-161 non-regression and strict-beat tests as the
13286    /// before-after baseline. Mirrors
13287    /// `encode_argb_with_predictor_chooser_no_r160_slack` in shape.
13288    fn encode_argb_with_predictor_chooser_no_r161_entropy(
13289        pixels: &[u32],
13290        width: u32,
13291        height: u32,
13292    ) -> Vec<u8> {
13293        let mut best = encode_argb_literals_with_width(pixels, width);
13294
13295        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13296        let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13297        let pred_block = 1u32 << pred_size_bits;
13298        let ctx_block = 1u32 << ctx_size_bits;
13299
13300        if width >= pred_block && height >= pred_block {
13301            let mut pred_single_block_size_bits: u8 = pred_size_bits;
13302            while pred_single_block_size_bits < 9
13303                && ((1u32 << pred_single_block_size_bits) < width
13304                    || (1u32 << pred_single_block_size_bits) < height)
13305            {
13306                pred_single_block_size_bits += 1;
13307            }
13308            let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13309            let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13310                encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13311            })];
13312            let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
13313            for slack in [
13314                pred_block_pixels,
13315                2 * pred_block_pixels,
13316                4 * pred_block_pixels,
13317            ] {
13318                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13319                    encode_with_predictor_slack(
13320                        pixels,
13321                        width,
13322                        height,
13323                        pred_size_bits,
13324                        cache_bits,
13325                        width,
13326                        slack,
13327                    )
13328                }));
13329            }
13330            if try_pred_single_block {
13331                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13332                    encode_with_predictor(
13333                        pixels,
13334                        width,
13335                        height,
13336                        pred_single_block_size_bits,
13337                        cache_bits,
13338                        width,
13339                    )
13340                }));
13341                let single_pred_block_pixels: u64 =
13342                    (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
13343                for slack in [
13344                    single_pred_block_pixels,
13345                    2 * single_pred_block_pixels,
13346                    4 * single_pred_block_pixels,
13347                ] {
13348                    pred_candidates.push(select_best_cache_bits(|cache_bits| {
13349                        encode_with_predictor_slack(
13350                            pixels,
13351                            width,
13352                            height,
13353                            pred_single_block_size_bits,
13354                            cache_bits,
13355                            width,
13356                            slack,
13357                        )
13358                    }));
13359                }
13360            }
13361            for cand in pred_candidates {
13362                if cand.len() < best.len() {
13363                    best = cand;
13364                }
13365            }
13366        }
13367
13368        if width >= ctx_block && height >= ctx_block {
13369            let mut single_block_size_bits: u8 = ctx_size_bits;
13370            while single_block_size_bits < 9
13371                && ((1u32 << single_block_size_bits) < width
13372                    || (1u32 << single_block_size_bits) < height)
13373            {
13374                single_block_size_bits += 1;
13375            }
13376            let try_single_block = single_block_size_bits != ctx_size_bits;
13377            let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13378                encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
13379            })];
13380            if try_single_block {
13381                candidates.push(select_best_cache_bits(|cache_bits| {
13382                    encode_with_color_transform(
13383                        pixels,
13384                        width,
13385                        height,
13386                        single_block_size_bits,
13387                        cache_bits,
13388                        width,
13389                    )
13390                }));
13391            }
13392            for cand in candidates {
13393                if cand.len() < best.len() {
13394                    best = cand;
13395                }
13396            }
13397        }
13398
13399        if collect_palette(pixels).is_some() {
13400            let ci_best = select_best_cache_bits(|cache_bits| {
13401                encode_with_color_indexing(pixels, width, height, cache_bits)
13402                    .expect("palette feasibility already confirmed")
13403            });
13404            if ci_best.len() < best.len() {
13405                best = ci_best;
13406            }
13407        }
13408
13409        if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
13410            if mp_best.len() < best.len() {
13411                best = mp_best;
13412            }
13413        }
13414
13415        best
13416    }
13417
13418    /// Round 161 — [`block_mode_entropy_cost`] reports zero milli-bits
13419    /// on a 1×1 block of pixel `0xff_00_00_00` (the top-left border
13420    /// rule sets `pred = 0xff_00_00_00`, so the residual is zero, the
13421    /// histogram has a single occupied bin per channel and the
13422    /// `c · log2(N/c) = N · log2(1) = 0` per-bin contribution sums to
13423    /// zero). Confirms the entropy summation correctly bottoms-out
13424    /// at the no-residual edge case.
13425    #[test]
13426    fn round_161_block_mode_entropy_cost_zero_on_zero_residual_block() {
13427        let pixels = vec![0xff_00_00_00u32; 1];
13428        for mode in 0u8..=13 {
13429            let cost = block_mode_entropy_cost(&pixels, 1, 1, 0, 0, 1, 1, mode);
13430            assert_eq!(
13431                cost, 0,
13432                "1×1 zero-residual block should produce zero-entropy cost under mode {mode}, got {cost}"
13433            );
13434        }
13435    }
13436
13437    /// Round 161 — on an interior solid-fill block, every mode that
13438    /// produces a *constant* residual (whether zero or non-zero) ties
13439    /// at zero Shannon entropy — Shannon entropy measures **variety**
13440    /// in the residual symbol distribution, not magnitude. This is
13441    /// the key structural difference from the L1 magnitude proxy: L1
13442    /// would penalise mode 0 (which emits constant non-zero residual
13443    /// `0x00_60_80_50` per pixel on a `0xff_60_80_50` solid block),
13444    /// while Shannon entropy correctly treats a constant-residual
13445    /// distribution as zero-cost (a Huffman code over a single-symbol
13446    /// alphabet emits one bit per symbol, which is the theoretical
13447    /// floor and matches the §3.7.2.1.1 single-leaf encoding's
13448    /// near-zero overhead).
13449    ///
13450    /// This test pins down that semantic: on the interior solid
13451    /// block, every neighbour-predicting mode AND mode 0 all sit at
13452    /// zero entropy cost; the chooser then falls through to the
13453    /// lowest-index tie-break (mode 0) or the hint when one is
13454    /// supplied.
13455    #[test]
13456    fn round_161_block_mode_entropy_cost_zero_on_constant_residual_block() {
13457        let w = 8usize;
13458        let h = 8usize;
13459        let pixels = vec![0xff_60_80_50u32; w * h];
13460        // Block [4..8) × [4..8) — interior. Every mode produces a
13461        // constant residual across the block (zero for the
13462        // neighbour-predicting modes; `0x00_60_80_50` for mode 0).
13463        // Constant residual = single-symbol histogram per channel
13464        // = zero Shannon entropy.
13465        for mode in 0u8..=13 {
13466            let cost = block_mode_entropy_cost(&pixels, w, h, 4, 4, 4, 4, mode);
13467            assert_eq!(
13468                cost, 0,
13469                "constant-residual mode {mode} on interior solid block should have zero entropy cost, got {cost}"
13470            );
13471        }
13472    }
13473
13474    /// Round 161 — Shannon entropy cost is strictly monotone in
13475    /// residual variety: a block whose residual histogram is
13476    /// peaked at a single value (zero or non-zero) has lower
13477    /// entropy cost than a block whose residuals scatter across
13478    /// multiple distinct values. This is the property a Huffman
13479    /// code over the residuals would actually minimise — and the
13480    /// L1 magnitude proxy does NOT distinguish (a constant non-
13481    /// zero residual block has the same L1 sum as a scattered
13482    /// block of the same mean magnitude). Confirms the entropy
13483    /// cost adds real signal vs the proxy.
13484    #[test]
13485    fn round_161_entropy_cost_distinguishes_concentrated_from_scattered() {
13486        // 16×16 image with two interior blocks. Concentrated block:
13487        // pure solid grey on the [4..8) × [4..8) corner — mode 1 (L
13488        // predictor) reproduces every interior pixel from its left
13489        // neighbour so every residual is zero. Scattered block:
13490        // checkerboard greys on the [8..12) × [8..12) corner — mode
13491        // 1 produces non-zero residuals alternating across
13492        // horizontal steps, populating multiple histogram bins.
13493        let w = 16usize;
13494        let h = 16usize;
13495        let grey = 0xff_60_80_50u32;
13496        let other = 0xff_70_90_60u32;
13497        let mut pixels = vec![grey; w * h];
13498        // Scatter `other` in a horizontal checkerboard across the
13499        // scattered block region. Use an isolated mutated quadrant
13500        // that doesn't reach the concentrated block; keep a buffer
13501        // row/column of solid grey around the scattered block so
13502        // its L neighbours at the block's left edge are still grey
13503        // (giving a deterministic histogram).
13504        for y in 8..12 {
13505            for x in 8..12 {
13506                if x % 2 == 0 {
13507                    pixels[y * w + x] = other;
13508                }
13509            }
13510        }
13511        let concentrated = block_mode_entropy_cost(&pixels, w, h, 4, 4, 4, 4, 1);
13512        let scattered = block_mode_entropy_cost(&pixels, w, h, 8, 8, 4, 4, 1);
13513        assert!(
13514            scattered > concentrated,
13515            "scattered block should have higher entropy cost than concentrated: \
13516             scattered={scattered}, concentrated={concentrated}"
13517        );
13518        assert_eq!(
13519            concentrated, 0,
13520            "concentrated (interior solid) block under mode 1 should have zero-entropy cost, \
13521             got {concentrated}"
13522        );
13523        assert!(
13524            scattered > 0,
13525            "scattered block should have strictly positive entropy cost, got {scattered}"
13526        );
13527    }
13528
13529    /// Round 161 — the entropy chooser's tie-break mechanism mirrors
13530    /// the round-159 strict tie-break: when `prefer_mode`'s entropy
13531    /// cost equals the best, the chooser returns the preferred mode.
13532    /// On an interior solid-fill block, *every* mode produces a
13533    /// constant residual (zero or a fixed colour) and so ties at
13534    /// zero Shannon entropy; the chooser falls back to the lowest-
13535    /// index tie (mode 0) and the hint flips to any preferred mode.
13536    #[test]
13537    fn round_161_pick_block_mode_with_hint_entropy_honours_tie() {
13538        let w = 8usize;
13539        let h = 8usize;
13540        let pixels = vec![0xff_60_80_50u32; w * h];
13541        // Interior [4..8) × [4..8) block — every mode is a constant
13542        // residual (Shannon entropy zero) for the reasons in
13543        // [`round_161_block_mode_entropy_cost_zero_on_constant_residual_block`].
13544        // No hint → lowest mode 0 wins.
13545        let no_hint = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, None);
13546        assert_eq!(no_hint, 0);
13547        // Hint mode 11 → ties at zero → tie-break flips to 11.
13548        let with_hint = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, Some(11));
13549        assert_eq!(with_hint, 11);
13550        // Hint mode 5 → ties at zero → tie-break flips to 5.
13551        let with_hint5 = pick_block_mode_with_hint_entropy(&pixels, w, h, 4, 4, 4, 4, Some(5));
13552        assert_eq!(with_hint5, 5);
13553    }
13554
13555    /// Round 161 — `encode_with_predictor_entropy` round-trips
13556    /// end-to-end through `decode_lossless_image`. Confirms the
13557    /// entropy chooser produces a decodable stream regardless of
13558    /// what cost model picked the modes (the §4.1 forward transform
13559    /// recomputes residuals against whatever mode the sub-image
13560    /// records, and the decoder applies the same inverse against
13561    /// that mode).
13562    #[test]
13563    fn round_161_entropy_predictor_round_trips_through_decoder() {
13564        let w = 32u32;
13565        let h = 32u32;
13566        // Mostly-uniform canvas with two small perturbations + a
13567        // single-pixel sprinkle — same recipe family as the round-
13568        // 160 strict-beat fixture, but smaller for fast test runs.
13569        let mut pixels = vec![0xff_60_80_50u32; (w * h) as usize];
13570        let mut s: u32 = 0xCAFE_BABE;
13571        for y in 2..8u32 {
13572            for x in 4..10u32 {
13573                s ^= s << 13;
13574                s ^= s >> 17;
13575                s ^= s << 5;
13576                pixels[(y * w + x) as usize] = (s & 0x0007_0707) | 0xff60_8050;
13577            }
13578        }
13579        for cache_bits in [None, Some(2u32), Some(8u32)] {
13580            let bytes = encode_with_predictor_entropy(
13581                &pixels,
13582                w,
13583                h,
13584                DEFAULT_PREDICTOR_SIZE_BITS,
13585                cache_bits,
13586                w,
13587            );
13588            let header = build_image_header(w, h, true);
13589            let mut payload = header.to_vec();
13590            payload.extend_from_slice(&bytes);
13591            let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13592            let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13593            assert_eq!(
13594                img.pixels(),
13595                pixels.as_slice(),
13596                "entropy predictor round-trip mismatch at cache_bits={cache_bits:?}"
13597            );
13598        }
13599    }
13600
13601    /// Round 161 — production chooser must never regress relative to
13602    /// the round-160 baseline. The round-161 entropy candidate is an
13603    /// additional path; the chooser keeps the byte-shortest stream,
13604    /// so adding a candidate cannot lengthen the output.
13605    #[test]
13606    fn round_161_chooser_never_regresses_vs_round_160() {
13607        let shapes: &[(u32, u32)] = &[(16, 16), (32, 32), (48, 48), (64, 32), (32, 64)];
13608        for &(w, h) in shapes {
13609            // Fixture A: solid fill.
13610            let solid = vec![0xff_60_80_50u32; (w * h) as usize];
13611            // Fixture B: low-frequency gradient.
13612            let mut gradient = vec![0u32; (w * h) as usize];
13613            for y in 0..h {
13614                for x in 0..w {
13615                    let r = (x * 255 / w.max(1)) as u8;
13616                    let g = (y * 255 / h.max(1)) as u8;
13617                    gradient[(y * w + x) as usize] =
13618                        0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | 0x40;
13619                }
13620            }
13621            // Fixture C: small noise patch on a solid background.
13622            let mut sparse = vec![0xff_70_70_70u32; (w * h) as usize];
13623            let mut s: u32 = 0xDEAD_BEEF ^ (w * h);
13624            for _ in 0..(w * h / 16) {
13625                s ^= s << 13;
13626                s ^= s >> 17;
13627                s ^= s << 5;
13628                let idx = ((s as usize) % sparse.len()) as usize;
13629                sparse[idx] = (s & 0x0003_0303) | 0xff70_7070;
13630            }
13631            for (name, pixels) in &[
13632                ("solid", &solid),
13633                ("gradient", &gradient),
13634                ("sparse", &sparse),
13635            ] {
13636                let r160 = encode_argb_with_predictor_chooser_no_r161_entropy(pixels, w, h);
13637                let r161 = encode_argb_with_predictor_chooser(pixels, w, h);
13638                assert!(
13639                    r161.len() <= r160.len(),
13640                    "round-161 chooser regressed on {name} {w}x{h}: \
13641                     r160={} B r161={} B",
13642                    r160.len(),
13643                    r161.len()
13644                );
13645                // Confirm decode round-trip on whatever the chooser
13646                // emitted — the chooser may have chosen the entropy
13647                // path or any of the L1 paths.
13648                let header = build_image_header(w, h, true);
13649                let mut payload = header.to_vec();
13650                payload.extend_from_slice(&r161);
13651                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13652                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13653                assert_eq!(
13654                    img.pixels(),
13655                    pixels.as_slice(),
13656                    "round-161 chooser output failed decode round-trip on {name} {w}x{h}"
13657                );
13658            }
13659        }
13660    }
13661
13662    /// Round 161 — sweep seeded fixtures to find at least one input
13663    /// where the entropy-cost predictor candidate strictly beats the
13664    /// best L1-proxy predictor candidate on raw bytes. Proves the
13665    /// entropy cost is doing real work — it's not merely a
13666    /// no-op-aliased duplicate of the round-160 path. The sweep
13667    /// also stress-tests round-trip correctness on every fixture
13668    /// where the entropy path wins.
13669    ///
13670    /// Construction: pre-residualised image families where the per-
13671    /// block mode-cost ordering differs between L1 magnitude and
13672    /// Shannon entropy. The most reliable family is one whose
13673    /// "lowest L1 mode" produces a varied residual histogram while
13674    /// some "slightly-higher L1 mode" produces a concentrated
13675    /// residual histogram — Shannon entropy picks the concentrated
13676    /// mode (faithful to what Huffman codes minimise), L1 picks the
13677    /// magnitude-min mode.
13678    #[test]
13679    fn round_161_entropy_candidate_strictly_beats_l1_on_some_fixture() {
13680        let w = 64u32;
13681        let h = 64u32;
13682        let size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13683        let block_pixels: u64 = (1u64 << size_bits) * (1u64 << size_bits);
13684        let mut found = false;
13685        let mut best_savings: i64 = 0;
13686        let mut seed_winner: u32 = 0;
13687        let mut family_winner: &'static str = "";
13688        // Family A: row-translated tile with a hand-chosen base
13689        // colour. The L predictor (mode 1) reproduces each row's
13690        // base colour and has zero residual on interior pixels —
13691        // but the top-row predict-L rule on the first row leaks a
13692        // varied histogram (each first-row pixel's residual is a
13693        // function of its preceding column's source colour). Mode
13694        // 0 (predict 0xff000000) emits a constant residual equal
13695        // to source per pixel — zero entropy when the image is
13696        // solid, non-zero entropy when scattered. On a scattered
13697        // image mode 1 is L1-best but mode 0 is entropy-best.
13698        for seed_init in [
13699            0xCAFE_BABEu32,
13700            0xC0FFEE00,
13701            0xDEAD_BEEF,
13702            0xFACE_F00D,
13703            0xFEED_F00D,
13704            0x1234_5678,
13705            0xABCD_1234,
13706            0x90AB_CDEF,
13707            0x5A5A_5A5A,
13708            0xA5A5_A5A5,
13709            0xBA5E_BA11,
13710            0xB16B_00B5,
13711            0x00DD_BA11,
13712            0xC1AB_AB00,
13713            0xDEAF_BABE,
13714            0xCABB_A6E0,
13715            0x1337_C0DE,
13716            0xABAD_CAFE,
13717            0xBADF_00D0,
13718            0x8BAD_F00D,
13719            0xFEE1_DEAD,
13720            0xDEFE_C8ED,
13721            0xD15E_A5E0,
13722            0x600D_F00D,
13723            0xDEAD_C0DE,
13724            0xBADC_0DED,
13725            0xCAFE_F00D,
13726            0xC0DE_F00D,
13727            0xDEED_BEEF,
13728            0xBEAD_F00D,
13729            0x8008_5318,
13730            0xD0DE_C0DE,
13731        ] {
13732            // Build a fixture whose per-block mode-cost ordering
13733            // disagrees between L1 and Shannon entropy. The family
13734            // below produces blocks of varying L1-vs-entropy
13735            // disagreement intensity:
13736            //
13737            // Quadrant A (top-left): smooth low-frequency pattern
13738            //   where neighbour-predicting modes have low L1 but
13739            //   spread their residuals across multiple histogram
13740            //   bins (residual varies slightly with position).
13741            // Quadrant B (bottom-right): rare "spike" pixels (1 or
13742            //   2 per block) where mode 0's constant residual
13743            //   distribution wins on entropy.
13744            //
13745            // The two quadrants live in separate predictor blocks
13746            // so each contributes independently to whichever mode
13747            // wins on a block-by-block basis.
13748            let mut pixels = vec![0xff_60_80_50u32; (w * h) as usize];
13749            let mut s = seed_init;
13750            // Quadrant A: 32x32 patterned image with column-driven
13751            // gradient and a per-row jitter — produces non-trivial
13752            // residual histograms for every mode, so the L1-vs-
13753            // entropy disagreement frequency goes up.
13754            for y in 0..(h / 2) {
13755                for x in 0..(w / 2) {
13756                    s ^= s << 13;
13757                    s ^= s >> 17;
13758                    s ^= s << 5;
13759                    // Column-correlated colour + per-row jitter.
13760                    let r = 0x40 + (x as u8 & 0x1f);
13761                    let g = 0x60 + ((y as u8) & 0x1f) + ((s & 1) as u8);
13762                    let b = 0x30 + ((x as u8 ^ y as u8) & 0x0f);
13763                    pixels[(y * w + x) as usize] =
13764                        0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | (b as u32);
13765                }
13766            }
13767            // Quadrant B: solid grey with deliberate single-pixel
13768            // spikes at predictable positions. The spikes are
13769            // chosen to land inside a few of the predictor blocks
13770            // so those blocks see a residual distribution with one
13771            // major bin (zero) and one minor bin (the spike). The
13772            // L1 chooser picks the mode that minimises spike
13773            // magnitude; the entropy chooser picks the mode that
13774            // minimises the count of distinct residual bins.
13775            for y in (h / 2)..h {
13776                for x in (w / 2)..w {
13777                    s ^= s << 13;
13778                    s ^= s >> 17;
13779                    s ^= s << 5;
13780                    if (s & 0x1f) == 0 {
13781                        // Spike: random near-grey perturbation.
13782                        let perturb = (s & 0x0f0f_0f0f) | 0xff60_8050;
13783                        pixels[(y * w + x) as usize] = perturb;
13784                    }
13785                }
13786            }
13787            // Best L1-proxy predictor candidate at default
13788            // size_bits: strict round-159 + round-160 slack sweep.
13789            let strict_bytes = encode_with_predictor(&pixels, w, h, size_bits, None, w);
13790            let mut best_l1_bytes = strict_bytes.clone();
13791            for slack in [block_pixels, 2 * block_pixels, 4 * block_pixels] {
13792                let bytes = encode_with_predictor_slack(&pixels, w, h, size_bits, None, w, slack);
13793                if bytes.len() < best_l1_bytes.len() {
13794                    best_l1_bytes = bytes;
13795                }
13796            }
13797            let entropy_bytes = encode_with_predictor_entropy(&pixels, w, h, size_bits, None, w);
13798            if entropy_bytes.len() < best_l1_bytes.len() {
13799                let saved = best_l1_bytes.len() as i64 - entropy_bytes.len() as i64;
13800                if saved > best_savings {
13801                    best_savings = saved;
13802                    seed_winner = seed_init;
13803                    family_winner = "two-quadrant";
13804                }
13805                if !found {
13806                    found = true;
13807                }
13808                // Round-trip the winning entropy stream end-to-end.
13809                let header = build_image_header(w, h, true);
13810                let mut payload = header.to_vec();
13811                payload.extend_from_slice(&entropy_bytes);
13812                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
13813                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13814                assert_eq!(
13815                    img.pixels(),
13816                    pixels.as_slice(),
13817                    "round-161 entropy strict-beat predictor candidate round-trip mismatch on \
13818                     seed=0x{seed_init:08x}"
13819                );
13820                eprintln!(
13821                    "[round-161] entropy strict-beat: seed=0x{seed_init:08x}, \
13822                     best_l1={} B entropy={} B saved={saved} B",
13823                    best_l1_bytes.len(),
13824                    entropy_bytes.len(),
13825                );
13826            }
13827        }
13828        // Family B: hand-crafted "constant non-zero residual"
13829        // fixture — a solid-colour image where mode 0 emits a
13830        // constant residual `source - 0xff000000` per pixel. The
13831        // L1 cost of mode 0 is `Σ |source - black|` per pixel; the
13832        // entropy cost of mode 0 is zero (single-symbol histogram).
13833        // Mode 1 (L predictor) also emits zero residual for
13834        // interior pixels but has non-zero residual at the leftmost
13835        // column. On a small image the per-block winner depends on
13836        // which of these effects dominates.
13837        if !found {
13838            // Build a 16×16 solid image — exactly one predictor
13839            // block at size_bits=4. The L1 cost of mode 0 is huge
13840            // (16² × magnitude); mode 1's cost is small (only the
13841            // leftmost column contributes). L1 picks mode 1.
13842            // Shannon entropy: mode 0 = 0 (constant residual);
13843            // mode 1 = small but non-zero (the leftmost column
13844            // residual). Entropy picks mode 0.
13845            //
13846            // Whether mode 0's predictor stream beats mode 1's
13847            // depends on the §5.x prefix-code overhead vs the
13848            // saved residual mass — not guaranteed, but a
13849            // candidate worth trying.
13850            let w2 = 16u32;
13851            let h2 = 16u32;
13852            let pixels2 = vec![0xff_80_80_80u32; (w2 * h2) as usize];
13853            let l1_bytes = encode_with_predictor(&pixels2, w2, h2, size_bits, None, w2);
13854            let entropy_bytes =
13855                encode_with_predictor_entropy(&pixels2, w2, h2, size_bits, None, w2);
13856            if entropy_bytes.len() < l1_bytes.len() {
13857                let saved = l1_bytes.len() as i64 - entropy_bytes.len() as i64;
13858                best_savings = saved;
13859                family_winner = "solid-grey-16x16";
13860                found = true;
13861                let header = build_image_header(w2, h2, true);
13862                let mut payload = header.to_vec();
13863                payload.extend_from_slice(&entropy_bytes);
13864                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w2, h2).unwrap();
13865                let img = crate::decode_lossless_image(&framed).unwrap().unwrap();
13866                assert_eq!(
13867                    img.pixels(),
13868                    pixels2.as_slice(),
13869                    "round-161 entropy strict-beat solid-grey round-trip mismatch"
13870                );
13871                eprintln!(
13872                    "[round-161] entropy strict-beat (solid-grey 16x16): \
13873                     l1={} B entropy={} B saved={saved} B",
13874                    l1_bytes.len(),
13875                    entropy_bytes.len(),
13876                );
13877            }
13878        }
13879        assert!(
13880            found,
13881            "round-161 entropy candidate did not produce a single strict byte reduction \
13882             across the seeded fixture set; the entropy cost never won \
13883             (best_savings={best_savings} on seed=0x{seed_winner:08x} family={family_winner})"
13884        );
13885    }
13886
13887    // ---- Round 162 tests: sub-image-aware Shannon-entropy chooser ----------
13888
13889    /// Local pre-round-162 copy of `encode_argb_with_predictor_chooser`
13890    /// that omits the round-162 sub-image-aware lambda sweep but
13891    /// keeps every round-161 entropy candidate. Used as the
13892    /// before-after baseline for the round-162 non-regression and
13893    /// strict-beat tests.
13894    fn encode_argb_with_predictor_chooser_no_r162_subaware(
13895        pixels: &[u32],
13896        width: u32,
13897        height: u32,
13898    ) -> Vec<u8> {
13899        let mut best = encode_argb_literals_with_width(pixels, width);
13900
13901        let pred_size_bits = DEFAULT_PREDICTOR_SIZE_BITS;
13902        let ctx_size_bits = DEFAULT_COLOR_TRANSFORM_SIZE_BITS;
13903        let pred_block = 1u32 << pred_size_bits;
13904        let ctx_block = 1u32 << ctx_size_bits;
13905
13906        if width >= pred_block && height >= pred_block {
13907            let mut pred_single_block_size_bits: u8 = pred_size_bits;
13908            while pred_single_block_size_bits < 9
13909                && ((1u32 << pred_single_block_size_bits) < width
13910                    || (1u32 << pred_single_block_size_bits) < height)
13911            {
13912                pred_single_block_size_bits += 1;
13913            }
13914            let try_pred_single_block = pred_single_block_size_bits != pred_size_bits;
13915            let mut pred_candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
13916                encode_with_predictor(pixels, width, height, pred_size_bits, cache_bits, width)
13917            })];
13918            let pred_block_pixels: u64 = (1u64 << pred_size_bits) * (1u64 << pred_size_bits);
13919            for slack in [
13920                pred_block_pixels,
13921                2 * pred_block_pixels,
13922                4 * pred_block_pixels,
13923            ] {
13924                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13925                    encode_with_predictor_slack(
13926                        pixels,
13927                        width,
13928                        height,
13929                        pred_size_bits,
13930                        cache_bits,
13931                        width,
13932                        slack,
13933                    )
13934                }));
13935            }
13936            pred_candidates.push(select_best_cache_bits(|cache_bits| {
13937                encode_with_predictor_entropy(
13938                    pixels,
13939                    width,
13940                    height,
13941                    pred_size_bits,
13942                    cache_bits,
13943                    width,
13944                )
13945            }));
13946            if try_pred_single_block {
13947                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13948                    encode_with_predictor(
13949                        pixels,
13950                        width,
13951                        height,
13952                        pred_single_block_size_bits,
13953                        cache_bits,
13954                        width,
13955                    )
13956                }));
13957                let single_pred_block_pixels: u64 =
13958                    (1u64 << pred_single_block_size_bits) * (1u64 << pred_single_block_size_bits);
13959                for slack in [
13960                    single_pred_block_pixels,
13961                    2 * single_pred_block_pixels,
13962                    4 * single_pred_block_pixels,
13963                ] {
13964                    pred_candidates.push(select_best_cache_bits(|cache_bits| {
13965                        encode_with_predictor_slack(
13966                            pixels,
13967                            width,
13968                            height,
13969                            pred_single_block_size_bits,
13970                            cache_bits,
13971                            width,
13972                            slack,
13973                        )
13974                    }));
13975                }
13976                pred_candidates.push(select_best_cache_bits(|cache_bits| {
13977                    encode_with_predictor_entropy(
13978                        pixels,
13979                        width,
13980                        height,
13981                        pred_single_block_size_bits,
13982                        cache_bits,
13983                        width,
13984                    )
13985                }));
13986            }
13987            for cand in pred_candidates {
13988                if cand.len() < best.len() {
13989                    best = cand;
13990                }
13991            }
13992        }
13993
13994        if width >= ctx_block && height >= ctx_block {
13995            let mut single_block_size_bits: u8 = ctx_size_bits;
13996            while single_block_size_bits < 9
13997                && ((1u32 << single_block_size_bits) < width
13998                    || (1u32 << single_block_size_bits) < height)
13999            {
14000                single_block_size_bits += 1;
14001            }
14002            let try_single_block = single_block_size_bits != ctx_size_bits;
14003            let mut candidates: Vec<Vec<u8>> = vec![select_best_cache_bits(|cache_bits| {
14004                encode_with_color_transform(pixels, width, height, ctx_size_bits, cache_bits, width)
14005            })];
14006            if try_single_block {
14007                candidates.push(select_best_cache_bits(|cache_bits| {
14008                    encode_with_color_transform(
14009                        pixels,
14010                        width,
14011                        height,
14012                        single_block_size_bits,
14013                        cache_bits,
14014                        width,
14015                    )
14016                }));
14017            }
14018            for cand in candidates {
14019                if cand.len() < best.len() {
14020                    best = cand;
14021                }
14022            }
14023        }
14024
14025        if collect_palette(pixels).is_some() {
14026            let ci_best = select_best_cache_bits(|cache_bits| {
14027                encode_with_color_indexing(pixels, width, height, cache_bits)
14028                    .expect("palette feasibility already confirmed")
14029            });
14030            if ci_best.len() < best.len() {
14031                best = ci_best;
14032            }
14033        }
14034
14035        if let Some(mp_best) = sweep_meta_prefix_candidate(pixels, width, height) {
14036            if mp_best.len() < best.len() {
14037                best = mp_best;
14038            }
14039        }
14040
14041        best
14042    }
14043
14044    /// Round 162 — `sub_image_mode_cost_delta_milli` returns zero when
14045    /// the first symbol is added to an empty histogram: the post-add
14046    /// state is a single-symbol histogram with `H = 0`, so the
14047    /// Shannon mass goes from 0 (degenerate) to 0 (single bin with
14048    /// `c·log2(N/c) = N·log2(1) = 0`).
14049    #[test]
14050    fn round_162_sub_image_mode_cost_delta_zero_on_first_add() {
14051        let hist = [0u32; 14];
14052        for mode in 0u8..=13 {
14053            let delta = sub_image_mode_cost_delta_milli(&hist, 0, mode);
14054            assert_eq!(
14055                delta, 0,
14056                "first symbol add must produce zero Shannon delta; mode={mode} delta={delta}"
14057            );
14058        }
14059    }
14060
14061    /// Round 162 — `sub_image_mode_cost_delta_milli` returns zero when
14062    /// the added symbol equals the only mode already present (still a
14063    /// single-symbol histogram post-add), and a strictly positive
14064    /// delta when the added symbol is *different* from the only mode
14065    /// already present (the histogram grows from one to two bins, so
14066    /// `N·H` grows from `0` to `2·log2(2) - 2·1·log2(1) = 2` bits).
14067    #[test]
14068    fn round_162_sub_image_mode_cost_delta_grows_on_new_symbol() {
14069        // Start with five occurrences of mode 3 already in the
14070        // histogram (single-symbol state, N·H = 0).
14071        let mut hist = [0u32; 14];
14072        hist[3] = 5;
14073        let total = 5u32;
14074
14075        let same = sub_image_mode_cost_delta_milli(&hist, total, 3);
14076        assert_eq!(
14077            same, 0,
14078            "adding same symbol to a single-mode histogram must not grow Shannon mass"
14079        );
14080
14081        let different = sub_image_mode_cost_delta_milli(&hist, total, 7);
14082        assert!(
14083            different > 0,
14084            "adding a new symbol to a single-mode histogram must grow Shannon mass; got 0"
14085        );
14086        // Sanity: the post-add N·H is 6·log2(6) − 5·log2(5) − 1·log2(1)
14087        //       ≈ 15.5097 − 11.6096 − 0 ≈ 3.9 bits ≈ 3900 milli-bits.
14088        // Pre-add was 0, so the delta should be roughly 3900 ±1.
14089        assert!(
14090            (3500..=4300).contains(&different),
14091            "expected delta near 3900 milli-bits; got {different}"
14092        );
14093    }
14094
14095    /// Round 162 — `lambda_milli == 0` makes the sub-image-aware
14096    /// chooser byte-identical to the round-161 entropy chooser: every
14097    /// candidate's joint cost equals its residual-only cost (the
14098    /// sub-image term contributes zero), and the tie-break rules
14099    /// match exactly.
14100    #[test]
14101    fn round_162_lambda_zero_byte_identical_to_round_161() {
14102        // Use a 32×32 fixture exercising the per-region path with at
14103        // least four 16×16 blocks worth of sub-image entries.
14104        let w = 32u32;
14105        let h = 32u32;
14106        let mut pixels = vec![0u32; (w * h) as usize];
14107        for y in 0..h as usize {
14108            for x in 0..w as usize {
14109                let r = (x as u8).wrapping_mul(7);
14110                let g = (y as u8).wrapping_mul(11);
14111                let b = ((x + y) as u8).wrapping_mul(13);
14112                pixels[y * w as usize + x] =
14113                    0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | (b as u32);
14114            }
14115        }
14116
14117        let r161 = encode_with_predictor_entropy(&pixels, w, h, 4, None, w);
14118        let r162_lambda0 = encode_with_predictor_entropy_subaware(&pixels, w, h, 4, None, w, 0);
14119        assert_eq!(
14120            r161, r162_lambda0,
14121            "lambda_milli == 0 must produce a byte-identical stream to round-161 entropy"
14122        );
14123
14124        // Also covers Some(cache_bits) — the cache path shouldn't
14125        // alter the equivalence.
14126        let r161_cached = encode_with_predictor_entropy(&pixels, w, h, 4, Some(6), w);
14127        let r162_cached_lambda0 =
14128            encode_with_predictor_entropy_subaware(&pixels, w, h, 4, Some(6), w, 0);
14129        assert_eq!(
14130            r161_cached, r162_cached_lambda0,
14131            "lambda_milli == 0 must be byte-identical with cache_bits = Some(6)"
14132        );
14133    }
14134
14135    /// Round 162 — `pick_block_mode_with_hint_entropy_subaware` honours
14136    /// the strict tie-break: when the preferred mode's joint cost
14137    /// equals the best, the chooser returns the preferred mode (so
14138    /// the sub-image keeps the longer mode-run). Mirrors the round-
14139    /// 159 / round-161 tie-break test.
14140    #[test]
14141    fn round_162_pick_block_mode_subaware_honours_tie() {
14142        // Tiny 1×1 block — every mode reduces to the top-left border
14143        // (`pred = 0xff_00_00_00`), so all modes yield zero residual
14144        // entropy and tie at zero. The hint should flip the result.
14145        let pixels = vec![0xff_00_00_00u32; 1];
14146        let hist = [0u32; 14];
14147        let chosen_no_hint = pick_block_mode_with_hint_entropy_subaware(
14148            &pixels, 1, 1, 0, 0, 1, 1, None, &hist, 0, 4_000,
14149        );
14150        assert_eq!(
14151            chosen_no_hint, 0,
14152            "no-hint pick should fall back to lowest-tied mode (= 0)"
14153        );
14154
14155        for hint in 0u8..=13 {
14156            let chosen = pick_block_mode_with_hint_entropy_subaware(
14157                &pixels,
14158                1,
14159                1,
14160                0,
14161                0,
14162                1,
14163                1,
14164                Some(hint),
14165                &hist,
14166                0,
14167                4_000,
14168            );
14169            assert_eq!(
14170                chosen, hint,
14171                "hint {hint} should win on a fully-tied block; got {chosen}"
14172            );
14173        }
14174    }
14175
14176    /// Round 162 — end-to-end round-trip: the sub-image-aware encoder
14177    /// produces a stream the §5.x decoder reconstructs to the
14178    /// original pixels at three lambda settings and two cache-bits
14179    /// settings, across a small fixture with mixed local statistics.
14180    #[test]
14181    fn round_162_subaware_round_trips_through_decoder() {
14182        let w = 32u32;
14183        let h = 32u32;
14184        let mut pixels = vec![0u32; (w * h) as usize];
14185        // Top-left 16×16: gradient. Top-right: noise. Bottom-left:
14186        // solid. Bottom-right: vertical bars. Drives different
14187        // per-block best modes across the four sub-image entries.
14188        for y in 0..h as usize {
14189            for x in 0..w as usize {
14190                let v = match (x < 16, y < 16) {
14191                    (true, true) => 0xff_00_00_00 | (((x + y) as u32 * 8) << 8),
14192                    (false, true) => {
14193                        let seed = (x.wrapping_mul(97) ^ y.wrapping_mul(53)) as u32;
14194                        0xff_00_00_00 | ((seed & 0xff) << 16) | (seed & 0xff00)
14195                    }
14196                    (true, false) => 0xff_80_80_80,
14197                    (false, false) => {
14198                        if x % 2 == 0 {
14199                            0xff_ff_ff_ff
14200                        } else {
14201                            0xff_00_00_00
14202                        }
14203                    }
14204                };
14205                pixels[y * w as usize + x] = v;
14206            }
14207        }
14208
14209        for lambda_milli in [1_000u64, 4_000u64, 16_000u64] {
14210            for cache_bits in [None, Some(4u32), Some(8u32)] {
14211                let payload = encode_with_predictor_entropy_subaware(
14212                    &pixels,
14213                    w,
14214                    h,
14215                    4,
14216                    cache_bits,
14217                    w,
14218                    lambda_milli,
14219                );
14220                let header = build_image_header(w, h, true);
14221                let mut bytes = header.to_vec();
14222                bytes.extend_from_slice(&payload);
14223                let framed = build::build_webp_file(&bytes, ImageKind::Lossless, w, h).unwrap();
14224                let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14225                assert_eq!(
14226                    decoded.pixels(),
14227                    pixels.as_slice(),
14228                    "round-trip mismatch lambda_milli={lambda_milli} cache_bits={cache_bits:?}"
14229                );
14230            }
14231        }
14232    }
14233
14234    /// Round 162 — the production chooser never regresses against the
14235    /// round-161 baseline: across 5 image shapes × 3 fixture
14236    /// generators, the round-162 chooser output is byte-`<=` the
14237    /// chooser-without-round-162-candidates output, AND every
14238    /// chosen stream round-trips through the decoder bit-exactly.
14239    #[test]
14240    fn round_162_chooser_never_regresses_vs_round_161() {
14241        let shapes: &[(u32, u32)] = &[(16, 16), (24, 32), (32, 24), (48, 48), (64, 32)];
14242        for &(w, h) in shapes {
14243            for fixture_kind in 0..3u32 {
14244                let mut pixels = vec![0u32; (w * h) as usize];
14245                for y in 0..h as usize {
14246                    for x in 0..w as usize {
14247                        let v = match fixture_kind {
14248                            0 => 0xff_00_00_00 | (((x ^ y) as u32 * 3) & 0xff),
14249                            1 => {
14250                                let seed =
14251                                    (x.wrapping_mul(2654435761).wrapping_add(y) & 0xff) as u32;
14252                                0xff_00_00_00 | (seed << 16) | seed
14253                            }
14254                            _ => {
14255                                if (x + y) % 5 < 2 {
14256                                    0xff_a0_a0_a0
14257                                } else {
14258                                    0xff_60_60_60
14259                                }
14260                            }
14261                        };
14262                        pixels[y * w as usize + x] = v;
14263                    }
14264                }
14265
14266                let baseline = encode_argb_with_predictor_chooser_no_r162_subaware(&pixels, w, h);
14267                let r162 = encode_argb_with_predictor_chooser(&pixels, w, h);
14268                assert!(
14269                    r162.len() <= baseline.len(),
14270                    "round-162 chooser regressed at shape={w}×{h} fixture={fixture_kind}: \
14271                     baseline={} B r162={} B",
14272                    baseline.len(),
14273                    r162.len()
14274                );
14275
14276                // Decode round-trip on the round-162 stream. The
14277                // chooser emits a bare VP8L payload; wrap with the
14278                // image header before framing.
14279                let header = build_image_header(w, h, true);
14280                let mut payload = header.to_vec();
14281                payload.extend_from_slice(&r162);
14282                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
14283                let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14284                assert_eq!(
14285                    decoded.pixels(),
14286                    pixels.as_slice(),
14287                    "round-trip mismatch at shape={w}×{h} fixture={fixture_kind}"
14288                );
14289            }
14290        }
14291    }
14292
14293    /// Round 162 — the *isolated* sub-image-aware predictor candidate
14294    /// (`encode_with_predictor_entropy_subaware`) strictly beats the
14295    /// round-161 isolated entropy candidate
14296    /// (`encode_with_predictor_entropy`) on every smooth-gradient
14297    /// fixture in the sweep. This is the headline empirical result
14298    /// for the round-162 cost model: smooth gradients are the
14299    /// canonical case where many §4.1 sub-image entries can converge
14300    /// onto a small mode set (the gradient predictors all yield
14301    /// near-zero residuals so the sub-image's prefix-code mass
14302    /// dominates total cost). The crossover at the swept lambda
14303    /// values (`64_000` per-sub-image-bit milli-units) is where the
14304    /// sub-image weighting takes off — below that, residual cost
14305    /// dominates and the round-161 chooser already wins.
14306    ///
14307    /// This compares the round-162 and round-161 predictor
14308    /// candidates **in isolation** (same `size_bits = 4`, both
14309    /// running through `apply_forward_predictor` + LZ77 + prefix
14310    /// coding) so the win is attributable to the chooser, not to
14311    /// other paths in the full chooser sweep (subtract-green,
14312    /// single-block predictor, etc.) which may produce an equally-
14313    /// tight stream by a different mechanism. The production chooser
14314    /// adds the round-162 candidate to its sweep and keeps byte-
14315    /// shortest, so even when other paths tie, the round-162 path
14316    /// strictly extends the encoder's option set.
14317    ///
14318    /// Round-trips through the decoder bit-exactly on every winning
14319    /// fixture.
14320    #[test]
14321    fn round_162_subaware_isolated_strictly_beats_round_161_on_some_fixture() {
14322        let shapes: &[(u32, u32)] = &[(64, 64), (128, 128), (256, 128), (96, 96), (160, 80)];
14323        let lambda_to_test: u64 = 64_000;
14324        let mut wins = 0u32;
14325        let mut max_savings: i64 = 0;
14326        let mut max_savings_shape: (u32, u32) = (0, 0);
14327        for &(w, h) in shapes {
14328            let mut pixels = vec![0u32; (w * h) as usize];
14329            for y in 0..h {
14330                for x in 0..w {
14331                    let r = (x * 255 / w.max(1)) as u8;
14332                    let g = (y * 255 / h.max(1)) as u8;
14333                    pixels[(y * w + x) as usize] =
14334                        0xff00_0000 | ((r as u32) << 16) | ((g as u32) << 8) | 0x40;
14335                }
14336            }
14337            let r161 = encode_with_predictor_entropy(&pixels, w, h, 4, None, w);
14338            let r162 =
14339                encode_with_predictor_entropy_subaware(&pixels, w, h, 4, None, w, lambda_to_test);
14340            // r162 may tie r161 on some shapes (the chosen mode set
14341            // already coincides), but it must never regress — the
14342            // sub-image-aware cost is a strict generalisation of the
14343            // round-161 cost.
14344            assert!(
14345                r162.len() <= r161.len(),
14346                "round-162 isolated candidate REGRESSED on gradient {w}x{h}: \
14347                 r161={} B r162={} B",
14348                r161.len(),
14349                r162.len()
14350            );
14351            let saved = r161.len() as i64 - r162.len() as i64;
14352            if r162.len() < r161.len() {
14353                wins += 1;
14354                if saved > max_savings {
14355                    max_savings = saved;
14356                    max_savings_shape = (w, h);
14357                }
14358                // Verify round-trip on the winning stream.
14359                let header = build_image_header(w, h, true);
14360                let mut payload = header.to_vec();
14361                payload.extend_from_slice(&r162);
14362                let framed = build::build_webp_file(&payload, ImageKind::Lossless, w, h).unwrap();
14363                let decoded = crate::decode_lossless_image(&framed).unwrap().unwrap();
14364                assert_eq!(
14365                    decoded.pixels(),
14366                    pixels.as_slice(),
14367                    "round-trip mismatch on gradient strict-beat {w}x{h}"
14368                );
14369                eprintln!(
14370                    "[round-162] isolated strict-beat (gradient {w}x{h}, lambda={lambda_to_test}): \
14371                     r161={} B r162={} B saved={saved} B ({:.1}% reduction)",
14372                    r161.len(),
14373                    r162.len(),
14374                    100.0 * saved as f64 / r161.len() as f64
14375                );
14376            } else {
14377                eprintln!(
14378                    "[round-162] tie (gradient {w}x{h}, lambda={lambda_to_test}): \
14379                     r161={} B r162={} B (no regression)",
14380                    r161.len(),
14381                    r162.len()
14382                );
14383            }
14384        }
14385        // Require strict wins on a majority of the gradient sweep —
14386        // proves the round-162 cost model is doing real work, not
14387        // just degenerating to the round-161 chooser everywhere.
14388        assert!(
14389            wins >= 3,
14390            "round-162 isolated candidate strictly beat round-161 on only {wins}/{} gradient \
14391             fixtures; expected at least 3 strict wins to demonstrate the sub-image cost is \
14392             doing real work",
14393            shapes.len()
14394        );
14395        eprintln!(
14396            "[round-162] isolated sub-image-aware: {wins}/{} gradient fixtures strict-won; \
14397             headline savings = {max_savings} B on {}x{}",
14398            shapes.len(),
14399            max_savings_shape.0,
14400            max_savings_shape.1
14401        );
14402    }
14403}