rasterrocket-encode 1.0.0

PPM/PGM/PBM/PNG output for the rasterrocket PDF renderer
Documentation
rasterrocket-encode-1.0.0 has been yanked.

rasterrocket

Pure Rust PDF → pixels pipeline. Zero Poppler, zero subprocesses, zero Leptonica in the render path.

Renders PDF pages to 8-bit grayscale pixel buffers for direct consumption by Tesseract OCR or any other downstream consumer. No intermediate files.

# Cargo.toml
rasterrocket = "1.0"

What's new in v1.0.0

  • Spec-correct simple-font text. The Widths-array lookup now prevails over FreeType metrics for all embedded and non-embedded simple fonts, per PDF §9.2.4. Academic and English-language PDFs show the largest improvement: avg RMSE vs pdftoppm dropped 17–19 points on corpus-05 and corpus-14.
  • Vulkan compute backend. --features vulkan or --backend vulkan runs AA fill, tile fill, and parallel-Huffman JPEG decode on any Vulkan 1.3+ device — NVIDIA, AMD, Intel, or Apple via MoltenVK. Verified on RTX 5070; cross-vendor smoke pending hardware. Under auto, Vulkan is preferred over CUDA when both are compiled in (faster process init on single-session workloads).
  • Process-static GPU init. The CUDA and Vulkan contexts are initialised once per process (not once per open_session). Short-lived multi-document pipelines no longer pay ~240 ms per document.
  • PDF_RASTER_BACKEND env var. Switch backends at runtime without recompiling. Valid values: auto, cpu, cuda, vaapi, vulkan. The CLI --backend flag takes precedence.
  • Parallel-Huffman JPEG. GPU-accelerated Huffman decode (gpu-jpeg-huffman feature, implied by vulkan) is wired into the production decode path. Dormant by default (threshold = u32::MAX); enable with PDF_RASTER_HUFFMAN_THRESHOLD=0 for benchmarking.
  • RasterOptions::default()dpi = 300, first_page = 1, last_page = u32::MAX, deskew = false, pages = None. Use ..RasterOptions::default() to fill unset fields.
  • PageSet — render a sparse subset of pages without visiting intermediate ones.
use rasterrocket::{RasterOptions, raster_pdf};

let opts = RasterOptions { dpi: 300.0, ..RasterOptions::default() };

for (page_num, result) in raster_pdf(Path::new("scan.pdf"), &opts) {
    let page = result?;
    // page.pixels — Vec<u8>, 8-bit greyscale, width × height, top-to-bottom
    // page.effective_dpi — pass to your OCR engine (accounts for PDF UserUnit scaling)
    // See the OCR Integration wiki for Tesseract, ocrs, Google Cloud Vision, and GPT-5 patterns.
}

Documentation

Document Contents
Getting Started Installation, quickstart, Tesseract integration, DPI guidance, error handling, security
API Reference Full signatures for raster_pdf, render_channel, RasterOptions, RenderedPage, RasterError, PageDiagnostics, feature flags, GPU dispatch thresholds
CLI Reference All rasterrocket command-line flags, output format matrix, examples, pixel-diff comparison
Benchmarks Methodology, 10-document corpus results, CPU-only AVX-512 vs AVX2, GPU-accelerated, reproduction steps
OCR Integration Tesseract (leptess) and ocrs — instance reuse, zero-copy patterns, DPI wiring, multi-threaded examples
LLM Vision OCR Integration Google Cloud Vision and GPT-5 — encoding helper, Rust + Python examples, cost and latency guidance

Crate map

Crate Role
rasterrocket Public APIraster_pdf, render_channel, RasterOptions, RenderedPage
rasterrocket-interp Native PDF interpreter — content streams, fonts, images, shading, transparency
rasterrocket-render Pixel-level fill/composite with AVX-512, AVX2, and NEON SIMD
gpu CUDA kernels — nvJPEG, nvJPEG2000, AA fill, tile fill, ICC CLUT, deskew; VA-API JPEG decode
rasterrocket-font FreeType glyph cache and rendering
rasterrocket-color Pixel types, colour math
rasterrocket-encode PPM / PGM / PBM / PNG output
rasterrocket-cli rasterrocket binary
pdf_bridge Poppler C++ wrapper — reference baseline only, not linked by CLI

Hardware compatibility

CPU: x86-64 (AMD and Intel) and aarch64 (ARM). AVX2/AVX-512 on x86-64; NEON (and SVE2 on nightly) on aarch64. Build with -C target-cpu=native to enable AVX-512 or native NEON width.

GPU (optional):

  • NVIDIA via CUDA 12 or 13 — full feature set (nvJPEG, nvJPEG2000, AA fill, ICC CLUT, ICC matrix, deskew, image cache). cudarc is pinned to the cuda-12080 driver-API binding so the same source builds against both 12.x and 13.x drivers (forward-compatible per the CUDA driver-API ABI).
  • Cross-vendor via Vulkan compute — AA fill, tile fill, and parallel-Huffman JPEG decode kernels run on any Vulkan 1.3+ device (NVIDIA, AMD, Intel, Apple via MoltenVK). Verified on RTX 5070; cross-vendor smoke pending hardware. No nvJPEG / cache support under Vulkan today (JPEG decode goes through the GPU parallel-Huffman path, not nvJPEG).
  • Linux iGPU/dGPU via VA-API — JPEG baseline decode on AMD VCN, Intel Quick Sync, Intel Arc.

All GPU features fall back to CPU automatically when unavailable. AMD/Radeon ROCm and Apple Metal-native backends are not implemented (Vulkan covers Apple via MoltenVK).

Build

# CPU-only (no CUDA)
cargo build --release -p rasterrocket-cli

# With all GPU features (CUDA 12 or 13 toolkit, NVIDIA GPU required)
# Default CUDA_ARCH is sm_80 (Ampere); override for older or newer GPUs.
CUDA_ARCH=sm_120 cargo build --release -p rasterrocket-cli \
  --features "rasterrocket/nvjpeg,rasterrocket/nvjpeg2k,rasterrocket/gpu-aa,rasterrocket/gpu-icc,rasterrocket/gpu-deskew,rasterrocket/cache"

# With Vulkan compute backend (cross-vendor; no NVIDIA dependency).
# Requires the LunarG Vulkan SDK on the build host (slangc compiles the
# .slang shaders to SPIR-V).  Vulkan 1.3+ ICD on the runtime host.
cargo build --release -p rasterrocket-cli --features "rasterrocket/vulkan"

Picking CUDA_ARCH for your GPU

The CUDA_ARCH environment variable controls which Compute Capability the PTX kernels target. Mismatched arch flags produce kernels the GPU can't load at runtime. Set it to your card's CC (e.g. sm_75, sm_86, sm_120).

GPU generation Architecture CUDA_ARCH
GTX 10-series Pascal sm_61
RTX 20-series, Quadro RTX Turing sm_75
RTX 30-series, A100 Ampere sm_80 / sm_86
RTX 40-series Ada Lovelace sm_89
H100 / Hopper Hopper sm_90
RTX 50-series Blackwell sm_120

Look up your card's exact Compute Capability at developer.nvidia.com/cuda-gpus. The build defaults to sm_80 if CUDA_ARCH is unset; that's a reasonable fallback for any Ampere-or-later card thanks to PTX forward-compatibility, but matching your hardware exactly produces better-optimised code.

Feature flags

Flag What it enables Required runtime
nvjpeg GPU JPEG decode for DCTDecode libnvjpeg.so (ships with CUDA 12 or 13 toolkit)
nvjpeg2k GPU JPEG-2000 decode for JPXDecode libnvjpeg2k.so
gpu-aa GPU supersampled anti-aliased fill CUDA
gpu-icc GPU CMYK→RGB ICC transform CUDA
gpu-deskew GPU deskew rotation via NPP CUDA + NPP
cache Device-resident image cache (3-tier VRAM/host/disk) CUDA
vaapi Linux iGPU/dGPU JPEG decode (AMD/Intel) libva.so.2 + DRM render node
vulkan Vulkan compute backend for AA fill, tile fill, and parallel-Huffman JPEG decode (cross-vendor) Vulkan 1.3+ ICD; pulls in gpu-aa and gpu-jpeg-huffman. Slang shaders compiled to SPIR-V via slangc from the LunarG Vulkan SDK

All GPU features fall back to CPU automatically when the runtime requirement is missing, except --backend cuda / --backend vulkan / --backend vaapi which fail loudly with a clear error.

Backend selection

The runtime backend is chosen from three sources, in priority order:

  1. The CLI --backend {auto,cpu,cuda,vaapi,vulkan} flag.
  2. The PDF_RASTER_BACKEND environment variable (same valid values).
  3. The compile-time default — auto.

Under auto, when both backends are compiled in, Vulkan is preferred over CUDA. Vulkan's per-process init is faster and the kernel dispatch is comparable on the workloads that matter; CUDA wins narrowly when the device-resident cache feature is firing and amortising across many pages from one session. Both backends fall through to CPU when their runtime is unavailable; --backend cuda / --backend vulkan make the failure loud instead.

# Ship a Vulkan-default binary, override per-process when you need CUDA:
PDF_RASTER_BACKEND=cuda rrocket input.pdf out

# CLI flag always wins over the env var:
PDF_RASTER_BACKEND=cuda rrocket --backend cpu input.pdf out   # uses CPU

Compile cache (sccache) — optional

.cargo/config.toml sets SCCACHE_CACHE_MULTIARCH=1 so -C target-cpu=native builds can be cached, but the wrapper itself is opt-in: plain cargo build works without sccache. To enable:

cargo install sccache       # one-time install
export RUSTC_WRAPPER=sccache # add to ~/.bashrc to make it permanent

Shared with any other Rust project on the same machine that opts in — cross-project cache keys don't collide (sccache hashes the full compiler args). Verify with sccache --show-stats (a healthy hit-rate after the first build is ≥ 70%). Do not rsync ~/.cache/sccache between machines with different CPUs — target-cpu=native resolves differently per arch.

Testing

# Unit tests (always filter by module, never run unfiltered)
cargo test -p rasterrocket-interp --lib -- resources
cargo test -p gpu --lib -- icc

# Pixel-diff comparison against pdftoppm (requires release build in PATH)
tests/compare/compare.sh -r 150 tests/fixtures/input.pdf

Performance

Benchmarks vs Poppler's pdftoppm on a 10-document corpus at 150 DPI. Full methodology, hardware details, and AVX2 vs AVX-512 comparison in the Benchmarks wiki page.

CPU-only (no GPU), Ryzen 9 9900X3D + AVX-512, v0.9.1, RAM-backed output, cold cache, hyperfine 5 runs:

Document Pages rasterrocket
Native text, small 16 41 ms ± 1 ms
Native vector + text 16 18 ms ± 1 ms
Native text, dense 254 231 ms ± 2 ms
Ebook, mixed 358 278 ms ± 3 ms
Academic book 601 582 ms ± 12 ms
Modern layout, DCT 160 1 450 ms ± 10 ms
Journal, DCT-heavy 162 783 ms ± 5 ms
1927 scan, DCT 390 1 652 ms ± 89 ms
1836 scan, DCT 490 2 859 ms ± 658 ms
Scan, JBIG2+JPX 576 17 616 ms ± 260 ms

Per-version regression history and the full pdftoppm comparison are in the Benchmarks wiki page.

GPU-accelerated (CUDA: nvJPEG + nvJPEG2000), same machine + RTX 5070 (v0.9.1):

Document Pages rasterrocket pdftoppm Speedup
Native text, dense 254 4.3 s 9.8 s 2.3×
1927 scan, DCT 390 50 s 279 s 5.6×
1836 scan, DCT 490 71 s 356 s 5.0×
Scan, JBIG2+JPX 576 19.6 s 148.9 s 7.6×

Largest gains on scan-heavy corpora where SIMD JPEG decoding (CPU) and nvJPEG/nvJPEG2000 (GPU) dominate. Short native-text PDFs are startup-bound and show modest gains. See the Benchmarks wiki page for the full table including an Intel i7-8700K (AVX2-only) comparison.