pixo 0.4.1

A minimal-dependency, high-performance image compression library
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
# The Evolution of Compression: How Small Gains Compound

This document explores the history and philosophy behind advanced compression techniques. Rather than implementation details, we focus on the _ideas_ that drive compression improvements and the patterns they share.

## The Fundamental Challenge

All compression faces the same core tension:

```text
┌─────────────────────────────────────────────────────────────────┐
│                    The Compression Tradeoff                      │
│                                                                  │
│    Information Theory Limit                                      │
│    ════════════════════════                                      │
│    Claude Shannon proved (1948) that every data source has       │
│    a minimum number of bits required to represent it.            │
│    You cannot do better than this limit.                         │
│                                                                  │
│    The Game                                                      │
│    ════════                                                      │
│    Get as close to the theoretical limit as possible,            │
│    while balancing:                                              │
│    • Encoding speed                                              │
│    • Decoding speed                                              │
│    • Memory usage                                                │
│    • Implementation complexity                                   │
└─────────────────────────────────────────────────────────────────┘
```

Modern codecs like mozjpeg don't use one clever trick. They use _dozens_ of small optimizations that each shave off 1-5%. These compound:

```text
0.95 × 0.97 × 0.95 × 0.98 × 0.96 = 0.82
```

Five optimizations of 2-5% each combine to reduce file size by 18%.

---

## The Three Pillars of Compression Improvement

Every major compression advancement falls into one of three categories:

### 1. Better Representation

**Core idea**: Transform data into a form that's easier to compress.

The classic example is the **Discrete Cosine Transform (DCT)** used in JPEG. Natural images have a property: nearby pixels are usually similar. The DCT exploits this by converting spatial data (pixel values) into frequency data (how fast values change).

```text
Spatial Domain          Frequency Domain
┌───────────────┐      ┌───────────────┐
│ 52 55 61 66   │      │ 420  -8  -2   │
│ 70 61 64 73   │ DCT  │  -3   5  -1   │
│ 63 59 55 90   │ ───► │   2  -4   1   │
│ 67 61 68 104  │      │   1   0  -1   │
└───────────────┘      └───────────────┘

Most energy concentrates in the top-left corner.
The rest becomes small numbers, often zero.
```

**Why it works**: Images don't change randomly. Smooth gradients become a single number (the DC coefficient). Only edges and textures need the high-frequency coefficients.

**Historical evolution**:

- 1974: DCT proposed by Ahmed, Natarajan, and Rao
- 1988: AAN fast algorithm reduces multiplications from 64 to 5
- 1990s: Integer DCT produces reproducible results across platforms

### 2. Smarter Encoding

**Core idea**: Use fewer bits for common patterns, more bits for rare patterns.

**Huffman coding** (1952) is the foundation. It assigns shorter codes to frequent symbols:

```text
Frequency-Based Codes
═════════════════════
Letter   Frequency   Naive Code   Huffman Code
  E        12.7%      00101         10
  T         9.1%      10100         110
  A         8.2%      00001         1110
  Z         0.07%     11010         1111111110

Common letters get 2-3 bits instead of 5.
Rare letters get more bits, but they're rare.
```

But Huffman has limits. It must assign whole bits, so a symbol with 90% probability still gets at least 1 bit when theory says 0.15 bits would suffice.

**Arithmetic coding** (1970s) solves this by treating the entire message as a single number between 0 and 1. In practice, it achieves 5-10% better compression than Huffman for many data types.

### 3. Rate-Distortion Optimization

**Core idea**: Accept some quality loss to save bits, but choose _which_ losses give the best tradeoff.

This is the most sophisticated approach. Instead of asking "how do I encode this data?", we ask "what data should I encode?"

```text
The R-D Framework
═════════════════
Every encoding decision has two costs:

  Rate (R):       How many bits does this choice cost?
  Distortion (D): How much quality does this choice lose?

The goal: Minimize R + λD

Where λ is a parameter that controls the tradeoff.
Higher λ → smaller files, more quality loss
Lower λ → larger files, less quality loss
```

This framework unifies many optimizations. Quantization, progressive encoding, and trellis optimization are all R-D problems in disguise.

---

## Case Study: The Journey from JPEG to mozjpeg

The original JPEG standard (1992) made practical choices for 1990s hardware. Over 30 years, each limitation became an optimization opportunity.

### Stage 1: Baseline JPEG (1992)

The original JPEG encoder:

- **Floating-point DCT**: Fast on FPUs, but results vary by platform
- **Simple quantization**: Round to nearest integer
- **Fixed Huffman tables**: Pre-defined, not optimized for the image
- **Sequential encoding**: Top-to-bottom, one pass

This was revolutionary for 1992. But each choice left performance on the table.

### Stage 2: Optimized Huffman Tables (1990s)

**The insight**: Pre-defined Huffman tables are generic. Tables tuned to the actual image content compress better.

**The technique**: Two-pass encoding

1. First pass: Count symbol frequencies
2. Build optimal Huffman tables from frequencies
3. Second pass: Encode with optimal tables

**Typical gain**: 2-5% smaller files

**Why it works**: Different images have different coefficient distributions. A sunset photo (smooth gradients) has very different statistics than a text document (sharp edges).

### Stage 3: Integer DCT (1990s-2000s)

**The problem**: Floating-point arithmetic varies by platform. The same image encoded on different machines produces slightly different files.

**The insight**: Fixed-point integer math is:

- Reproducible across platforms
- Often faster (no FPU needed)
- Can be tuned for optimal coefficient distribution

**The technique**: Scale all DCT constants by 2^13, use integer arithmetic, round carefully at each stage.

```text
Floating-point:  cos(π/8) = 0.9238795...
Fixed-point:     (0.9238795 × 8192) ≈ 7568

Multiplication becomes:
  result = (a × 7568) >> 13  // Shift replaces division
```

**Typical gain**: 1-3% from better coefficient distribution

**Why it works**: Careful rounding in integer math can actually produce _better_ coefficient distributions than floating-point, because the small rounding errors happen in predictable, controllable ways.

### Stage 4: Progressive Encoding (1990s-2000s)

**The insight**: Not all parts of an image are equally important. The overall brightness matters more than fine texture details.

**The technique**: Instead of encoding each 8×8 block completely before moving to the next, split the encoding into multiple "scans":

```text
Progressive Scan Structure
══════════════════════════

Scan 1: DC coefficients only (overall brightness)
        ┌─────────────────────────────────────┐
        │ ██                                  │  Just the average
        │                                     │  brightness of each
        │                                     │  8×8 block
        └─────────────────────────────────────┘

Scan 2: Low-frequency AC (basic shapes)
        ┌─────────────────────────────────────┐
        │ ██ ░░                               │  Basic gradients
        │ ░░                                  │  and large shapes
        │                                     │
        └─────────────────────────────────────┘

Scan 3-N: Higher frequencies (details, texture)
        ┌─────────────────────────────────────┐
        │ ██ ░░ ░░ ▒▒ ▒▒ ▒▒ ░░ ░░            │  Fine details
        │ ░░ ▒▒ ▒▒ ▒▒ ░░ ░░ ░░ ░░            │  and textures
        │ ░░ ▒▒ ░░ ░░ ░░ ░░ ░░               │
        └─────────────────────────────────────┘
```

**Typical gain**: 5-15% smaller files

**Why it works**:

1. **Better Huffman statistics**: Each scan contains similar data, improving compression
2. **Spectral correlation**: DC coefficients across blocks are similar, as are low-frequency AC coefficients
3. **User experience**: Image appears quickly (blurry), then sharpens progressively

### Stage 5: Trellis Quantization (2000s-2010s)

**The insight**: Simple rounding isn't optimal. Sometimes rounding _away_ from the nearest integer produces a smaller file with negligible quality loss.

**The problem in detail**:

```text
Quantization Decision
═════════════════════
DCT coefficient: 47.3
Quantization divisor: 10
Simple round: 47.3/10 = 4.73 → 5

But wait—what does "5" cost to encode?
- If the previous coefficient was 4, encoding "5" might cost 3 bits
- If the previous coefficient was 0, encoding "5" might cost 6 bits
- And encoding "4" instead only adds 0.09 distortion...

The optimal choice depends on context!
```

**The technique**: Model this as a graph problem

```text
Trellis Graph
═════════════
Each coefficient position has multiple candidates.
Edges connect candidates with their encoding costs.
Find the path that minimizes total (Rate + λ × Distortion).

Position 1        Position 2        Position 3
    [4]───────────────[2]───────────────[0]
     │╲               │╲               │
     │ ╲              │ ╲              │
    [5]──╲───────────[3]──╲───────────[1]
          ╲               ╲              ╲
          [6]─────────────[4]─────────────[2]

Use Viterbi algorithm to find optimal path.
```

**Typical gain**: 3-8% smaller files

**Why it works**: Huffman coding costs depend on context (run-length of zeros, magnitude of values). By considering these costs during quantization, we make globally better decisions.

### Stage 6: mozjpeg (2014-present)

mozjpeg combines all the above, plus:

- **Scan optimization**: Automatically select the best progressive scan script
- **DC coefficient optimization**: Special handling for DC differences
- **Quantization table tuning**: Tables optimized for human perception, not just PSNR

**Combined result**: 10-30% smaller than baseline JPEG at equivalent quality.

---

## Common Patterns Across Techniques

These compression improvements share several recurring themes:

### Pattern 1: Two-Pass Is Often Better Than One

Many optimizations require knowing something about the data before encoding it:

| Technique            | First Pass                   | Second Pass                    |
| -------------------- | ---------------------------- | ------------------------------ |
| Optimized Huffman    | Count frequencies            | Encode with optimal tables     |
| Trellis Quantization | Compute all DCT coefficients | Find optimal quantization path |
| Progressive          | Store all coefficients       | Encode in optimal scan order   |

**The tradeoff**: 2× encoding time for 5-15% size reduction. Usually worth it for storage/transmission.

### Pattern 2: Context Improves Compression

Information theory tells us: the more you know about what comes next, the fewer bits you need to encode it.

| Technique              | Context Used                            |
| ---------------------- | --------------------------------------- |
| Differential DC coding | Previous block's DC value               |
| Run-length encoding    | How many zeros came before              |
| Progressive scans      | Similar-frequency coefficients together |
| Trellis quantization   | Previous coefficient values             |

**The insight**: Don't just encode values—encode _differences_ from predictions.

### Pattern 3: Human Perception Is Non-Linear

Our eyes and brains don't perceive all information equally:

```text
What Humans Notice vs. What Compression Sees
════════════════════════════════════════════

High sensitivity:           Low sensitivity:
• Edges and contours        • Uniform areas
• Faces                     • High-frequency noise
• Text                      • Blue channel details
• Low-frequency patterns    • Areas with texture masking
```

Smart compression allocates bits where humans will notice, not uniformly.

### Pattern 4: Exact vs. Good Enough

Many optimizations distinguish between:

- **Mathematically optimal**: The provably best answer (often expensive to compute)
- **Good enough**: 99% as good, 10× faster to compute

| Technique         | Optimal Approach  | Practical Approach |
| ----------------- | ----------------- | ------------------ |
| Huffman tables    | Arithmetic coding | Canonical Huffman  |
| Trellis           | Full Viterbi      | Pruned trellis     |
| Progressive scans | Exhaustive search | Heuristic scripts  |

**The wisdom**: Diminishing returns are real. The last 1% improvement often costs 10× the computation.

### Pattern 5: Iterative Refinement

Some optimizations face a chicken-and-egg problem: the optimal choice at step A depends on what happens at step B, but step B depends on step A.

The classic example is **LZ77 + Huffman** (used in Deflate/PNG):

```text
The Chicken-and-Egg Problem
════════════════════════════
LZ77 parsing decides which matches to use.
Huffman coding assigns bit lengths based on symbol frequencies.

But: The best LZ77 parse depends on Huffman bit costs.
And: Huffman costs depend on which symbols LZ77 produces.

Which do we optimize first?
```

**The solution**: Iterate until convergence.

```text
Iterative Refinement (Zopfli-style)
═══════════════════════════════════

1. Initial pass:    Greedy LZ77 → baseline symbol frequencies
2. Calculate costs: Convert frequencies to bit lengths via entropy
3. Re-parse:        Optimal LZ77 using entropy-based costs
4. Iterate:         Repeat steps 2-3 for N iterations
5. Track best:      Keep the result with smallest actual size

Each iteration refines the cost model, enabling better parsing decisions.
Typically converges in 5-15 iterations.
```

| Iteration | What Happens                                              |
| --------- | --------------------------------------------------------- |
| 1         | Greedy parse, rough frequency estimates                   |
| 2-5       | Costs stabilize, parse improves significantly             |
| 5-15      | Diminishing returns, occasional escapes from local minima |

**Why it works**: The cost function and the parsing algorithm co-evolve toward a better solution. Neither alone finds the optimum, but together they converge.

**Real-world examples**:

- Zopfli uses 15 iterations by default
- Video codecs iterate between motion estimation and rate control
- Machine learning uses gradient descent (same principle: iterate to refine)

---

## The Compounding Effect

Here's why mature codecs like mozjpeg are hard to beat:

```text
Starting with baseline JPEG (100%)
═══════════════════════════════════

  Optimized Huffman tables:     ×0.97  →  97%
  Integer DCT precision:        ×0.98  →  95%
  Progressive encoding:         ×0.92  →  87%
  Trellis quantization:         ×0.95  →  83%
  Chroma subsampling:           ×0.85  →  71%
  DC coefficient optimization:  ×0.98  →  69%

  Final: ~70% of baseline size at same quality
```

Each technique contributes a small factor. Together, they're transformative.

---

## Applying These Lessons

When implementing compression:

1. **Start with the standard algorithm**. Understand why each step exists before optimizing.

2. **Measure before optimizing**. Profile to find where bits are actually being spent.

3. **Look for context**. Whatever you're encoding, ask: what did we just encode that predicts this?

4. **Consider R-D tradeoffs**. Not all bits are equal—some matter more than others.

5. **Accept diminishing returns**. The first 80% of improvement takes 20% of the effort. Know when to stop.

6. **Learn from existing work**. mozjpeg, zopfli, and other mature codecs encode decades of optimization wisdom.

---

## Further Reading

### Foundational Papers

- Shannon, C. E. (1948). "A Mathematical Theory of Communication"
- Huffman, D. (1952). "A Method for the Construction of Minimum-Redundancy Codes"
- Ahmed, Natarajan, Rao (1974). "Discrete Cosine Transform"
- Arai, Agui, Nakajima (1988). "A Fast DCT-SQ Scheme for Images"

### JPEG Specific

- ITU-T T.81: The original JPEG specification
- Lakhani, G. (2003). "Optimal Huffman Coding of DCT Blocks"
- mozjpeg documentation: <https://github.com/mozilla/mozjpeg>

### Rate-Distortion Theory

- Berger, T. (1971). "Rate Distortion Theory"
- Sullivan & Wiegand (1998). "Rate-Distortion Optimization for Video Compression"

### Deflate/PNG Specific

- RFC 1951: DEFLATE Compressed Data Format Specification
- Zopfli documentation: <https://github.com/google/zopfli>
- Pinho et al. (2004). "A note on Zeng's technique for color reindexing" (palette optimization)

### Implementation References

- libjpeg source code (especially jfdctint.c, jchuff.c)
- Independent JPEG Group documentation
- zopfli source code (especially squeeze.c for iterative refinement)