# Linux Profiling & Cross-Platform Findings
Test system: 2-core AMD EPYC-Milan @ 2.0GHz, 7.6GB RAM, Ubuntu 24.04, kernel 6.8.0
Test PDF: 150-page lecture document at 300 DPI
## Build Notes
### libjpeg symbol collision (resolved)
MuPDF vendors libjpeg-9 and turbojpeg vendors libjpeg-turbo. Both define identical symbols.
This caused duplicate symbol errors or runtime version mismatches.
**Fix:** `Cargo.toml` enables the `sys-lib-libjpeg` feature on `mupdf`, which tells MuPDF to
use the system libjpeg (libjpeg-turbo) instead of its vendored libjpeg-9. This eliminates
the symbol conflict entirely.
Build dependencies:
```
apt install cmake nasm libclang-dev libfontconfig1-dev libjpeg-turbo8-dev pkg-config
```
## JPEG Encoding — 4:4:4 vs 4:2:0 Subsampling
### Discovery
Initial benchmarks showed JPG encoding was **2x slower than PNG** on both platforms.
This was traced to the `turbojpeg` Rust crate defaulting to `Subsamp::None` (4:4:4) —
no chroma subsampling. Every other JPEG tool (pdftoppm, ghostscript, cjpeg) defaults
to 4:2:0.
4:4:4 performs DCT + quantization + Huffman on all three color channels at full resolution.
4:2:0 subsamples chrominance by 2x in both dimensions, reducing encoding work by ~40%
with no perceptible quality loss for document/photo content.
### Measured encoding speed (tjbench, 2550x3300 page, Q75)
| 4:4:4 | 53 | 18.9 | baseline |
| 4:2:2 | 71 | 14.1 | 1.34x |
| 4:2:0 | 88 | 11.4 | 1.66x |
| Grayscale | 118 | 8.5 | 2.22x |
### Impact of fix (`set_subsamp(Sub2x2)`)
**Linux (150pg, 300 DPI, 2 cores):**
| j=1 | 3.49s | 2.29s | 34% |
| j=2 | 2.74s | 1.84s | 33% |
**macOS (150pg, 300 DPI, 8 cores, Apple Silicon):**
| j=1 | 2.91s | 1.75s | 40% |
| j=8 | 0.56s | 0.40s | 29% |
## Competitive Benchmarks — Linux
### Split: 150-page PDF, 300 DPI
**PNG:**
| ovid | 1.40s | 2.49s | 0.26s | 104MB | — |
| mutool | 15.66s | 13.67s | 1.98s | 81MB | 11.2x |
| gs | 25.17s | 24.87s | 0.29s | 37MB | 18.0x |
| pdftoppm | 85.79s | 85.60s | 0.17s | 80MB | 61.3x |
**JPG:**
| ovid | 1.84s | 3.39s | 0.22s | 101MB | — |
| gs | 2.89s | 2.59s | 0.29s | 37MB | 1.6x |
| pdftoppm | 4.12s | 3.97s | 0.14s | 80MB | 2.2x |
### Split: 50-page PDF, 300 DPI
**JPG:**
| ovid | 0.67s | — |
| gs | 1.08s | 1.6x |
| pdftoppm | 2.37s | 3.5x |
### Thread scaling (15pg, 300 DPI, 2 cores)
| PNG | 0.79s | 0.60s | 1.32x |
| JPG | 0.95s | 0.69s | 1.38x |
Parallel efficiency is lower on 2 cores than 8 — expected, since coordination overhead
is a larger fraction of total work.
### Binary size
| ovid | 11MB | standalone (static MuPDF + turbojpeg) |
| mutool | ~1MB | + shared libmupdf |
| gs | ~1MB | + shared libs |
| pdftoppm | ~50KB | + shared libpoppler |
Linux binary is larger than macOS (8.7MB) due to x86_64 SIMD assembly and
different LTO behavior across platforms.
## Time Breakdown — Where Cycles Go
Measured by comparing single-threaded PNG vs JPG and isolating via tjbench:
**150pg, 300 DPI, single-threaded (Linux):**
| MuPDF rendering | ~0.66s | ~0.66s |
| Image encoding | ~1.14s | ~1.63s |
| **Total** | 1.80s | 2.29s |
MuPDF rendering at 4.4ms/page is the floor — no tool can go faster than its renderer.
Encoding overhead is the only lever, and PNG with `Compression::Fast` (zlib level 1,
Paeth filter) is cheaper than JPEG at 4:2:0 on this workload because deflate on
document-like content (large flat regions, sharp edges) compresses very efficiently
with minimal CPU work.
## Why JPEG Margin Is Lower Than PNG Margin
1. **PNG competitors are single-threaded and slow.** pdftoppm takes 85s for PNG —
its PNG encoder (libpng) is far slower than ovid's (png crate + zlib-rs). This
gives ovid a 61x advantage. For JPEG, pdftoppm uses libjpeg-turbo internally
and takes only 4.1s — the encoding gap is much smaller.
2. **MuPDF rendering dominates JPEG total time.** For JPG, encoding is ~71% of
single-threaded time (1.63s of 2.29s). For PNG, encoding is ~63% (1.14s of 1.80s).
The rendering floor is the same for both, so the encoding is where ovid
differentiates itself, and the PNG encoding gap vs competitors is wider.
3. **2-core parallelism has diminishing returns.** ovid's primary advantage is
parallel rendering across cores. On 2 cores, the speedup ceiling is 2x. On 8 cores
(macOS), ovid achieves 4-5x speedup, making the margin much wider.