ovid 0.1.1 - Docs.rs

# Linux Profiling & Cross-Platform Findings

Test system: 2-core AMD EPYC-Milan @ 2.0GHz, 7.6GB RAM, Ubuntu 24.04, kernel 6.8.0
Test PDF: 150-page lecture document at 300 DPI

## Build Notes

### libjpeg symbol collision (resolved)

MuPDF vendors libjpeg-9 and turbojpeg vendors libjpeg-turbo. Both define identical symbols.
This caused duplicate symbol errors or runtime version mismatches.

**Fix:** `Cargo.toml` enables the `sys-lib-libjpeg` feature on `mupdf`, which tells MuPDF to
use the system libjpeg (libjpeg-turbo) instead of its vendored libjpeg-9. This eliminates
the symbol conflict entirely.

Build dependencies:
```
apt install cmake nasm libclang-dev libfontconfig1-dev libjpeg-turbo8-dev pkg-config
```

## JPEG Encoding — 4:4:4 vs 4:2:0 Subsampling

### Discovery

Initial benchmarks showed JPG encoding was **2x slower than PNG** on both platforms.
This was traced to the `turbojpeg` Rust crate defaulting to `Subsamp::None` (4:4:4) —
no chroma subsampling. Every other JPEG tool (pdftoppm, ghostscript, cjpeg) defaults
to 4:2:0.

4:4:4 performs DCT + quantization + Huffman on all three color channels at full resolution.
4:2:0 subsamples chrominance by 2x in both dimensions, reducing encoding work by ~40%
with no perceptible quality loss for document/photo content.

### Measured encoding speed (tjbench, 2550x3300 page, Q75)

| Subsampling | FPS | ms/page | vs 4:4:4 |
|-------------|-----|---------|----------|
| 4:4:4       | 53  | 18.9    | baseline |
| 4:2:2       | 71  | 14.1    | 1.34x    |
| 4:2:0       | 88  | 11.4    | 1.66x    |
| Grayscale   | 118 | 8.5     | 2.22x    |

### Impact of fix (`set_subsamp(Sub2x2)`)

**Linux (150pg, 300 DPI, 2 cores):**

| Config | 4:4:4 (before) | 4:2:0 (after) | Improvement |
|--------|----------------|---------------|-------------|
| j=1    | 3.49s          | 2.29s         | 34%         |
| j=2    | 2.74s          | 1.84s         | 33%         |

**macOS (150pg, 300 DPI, 8 cores, Apple Silicon):**

| Config | 4:4:4 (before) | 4:2:0 (after) | Improvement |
|--------|----------------|---------------|-------------|
| j=1    | 2.91s          | 1.75s         | 40%         |
| j=8    | 0.56s          | 0.40s         | 29%         |

## Competitive Benchmarks — Linux

### Split: 150-page PDF, 300 DPI

**PNG:**

| Tool     | Wall    | User   | Sys   | RSS   | Speedup |
|----------|---------|--------|-------|-------|---------|
| ovid     | 1.40s   | 2.49s  | 0.26s | 104MB | —       |
| mutool   | 15.66s  | 13.67s | 1.98s | 81MB  | 11.2x   |
| gs       | 25.17s  | 24.87s | 0.29s | 37MB  | 18.0x   |
| pdftoppm | 85.79s  | 85.60s | 0.17s | 80MB  | 61.3x   |

**JPG:**

| Tool     | Wall    | User   | Sys   | RSS   | Speedup |
|----------|---------|--------|-------|-------|---------|
| ovid     | 1.84s   | 3.39s  | 0.22s | 101MB | —       |
| gs       | 2.89s   | 2.59s  | 0.29s | 37MB  | 1.6x    |
| pdftoppm | 4.12s   | 3.97s  | 0.14s | 80MB  | 2.2x    |

### Split: 50-page PDF, 300 DPI

**JPG:**

| Tool     | Wall    | Speedup |
|----------|---------|---------|
| ovid     | 0.67s   | —       |
| gs       | 1.08s   | 1.6x    |
| pdftoppm | 2.37s   | 3.5x    |

### Thread scaling (15pg, 300 DPI, 2 cores)

| Format | j=1   | j=2   | Speedup |
|--------|-------|-------|---------|
| PNG    | 0.79s | 0.60s | 1.32x   |
| JPG    | 0.95s | 0.69s | 1.38x   |

Parallel efficiency is lower on 2 cores than 8 — expected, since coordination overhead
is a larger fraction of total work.

### Binary size

| Tool     | Size  | Notes                         |
|----------|-------|-------------------------------|
| ovid     | 11MB  | standalone (static MuPDF + turbojpeg) |
| mutool   | ~1MB  | + shared libmupdf             |
| gs       | ~1MB  | + shared libs                 |
| pdftoppm | ~50KB | + shared libpoppler           |

Linux binary is larger than macOS (8.7MB) due to x86_64 SIMD assembly and
different LTO behavior across platforms.

## Time Breakdown — Where Cycles Go

Measured by comparing single-threaded PNG vs JPG and isolating via tjbench:

**150pg, 300 DPI, single-threaded (Linux):**

| Phase                 | PNG path | JPG path (4:2:0) |
|-----------------------|----------|-------------------|
| MuPDF rendering       | ~0.66s   | ~0.66s            |
| Image encoding        | ~1.14s   | ~1.63s            |
| **Total**             | 1.80s    | 2.29s             |

MuPDF rendering at 4.4ms/page is the floor — no tool can go faster than its renderer.
Encoding overhead is the only lever, and PNG with `Compression::Fast` (zlib level 1,
Paeth filter) is cheaper than JPEG at 4:2:0 on this workload because deflate on
document-like content (large flat regions, sharp edges) compresses very efficiently
with minimal CPU work.

## Why JPEG Margin Is Lower Than PNG Margin

1. **PNG competitors are single-threaded and slow.** pdftoppm takes 85s for PNG —
   its PNG encoder (libpng) is far slower than ovid's (png crate + zlib-rs). This
   gives ovid a 61x advantage. For JPEG, pdftoppm uses libjpeg-turbo internally
   and takes only 4.1s — the encoding gap is much smaller.

2. **MuPDF rendering dominates JPEG total time.** For JPG, encoding is ~71% of
   single-threaded time (1.63s of 2.29s). For PNG, encoding is ~63% (1.14s of 1.80s).
   The rendering floor is the same for both, so the encoding is where ovid
   differentiates itself, and the PNG encoding gap vs competitors is wider.

3. **2-core parallelism has diminishing returns.** ovid's primary advantage is
   parallel rendering across cores. On 2 cores, the speedup ceiling is 2x. On 8 cores
   (macOS), ovid achieves 4-5x speedup, making the margin much wider.