PDFPurr
The ultimate pure-Rust PDF library
PDFPurr is a comprehensive PDF library for Rust that reads, writes, edits, renders, and validates PDF documents. It supports PDF 1.0 through 2.0, with first-class support for accessibility (PDF/UA), archival (PDF/A), and print production (PDF/X) standards.
870+ tests across unit, integration, property-based, and fuzz testing. CI runs on Ubuntu, macOS, and Windows with nightly clippy and libFuzzer smoke tests.
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
use Document;
// Create a new PDF
let mut doc = new;
doc.add_page.unwrap; // US Letter
let bytes = doc.to_bytes.unwrap;
// Parse it back
let doc = from_bytes.unwrap;
assert_eq!;
Feature Guide
Reading and Parsing
PDFPurr parses any PDF from 1.0 to 2.0, including encrypted and malformed files.
use Document;
// Open from disk
let doc = open.unwrap;
// Parse from bytes
let data = read.unwrap;
let doc = from_bytes.unwrap;
// Open an encrypted PDF
let doc = from_bytes_with_password.unwrap;
// Basic document info
println!;
println!;
if let Some = doc.title
Supported encryption: Standard security handler revisions R2 through R6 (RC4 40/128-bit, AES-128, AES-256).
Repair: If the cross-reference table is corrupt, PDFPurr automatically rebuilds it by scanning for object definitions. Streams with incorrect /Length values are recovered by scanning for endstream.
Text Extraction
Extract text from individual pages or the entire document.
use Document;
let doc = open.unwrap;
// Single page
let text = doc.extract_page_text.unwrap;
println!;
// All pages
let all_text = doc.extract_all_text.unwrap;
println!;
Text extraction uses font encoding tables (WinAnsiEncoding, MacRomanEncoding, PDFDocEncoding) and ToUnicode CMaps for accurate character mapping, including CJK scripts.
Image Extraction
Extract images from any page.
use Document;
let doc = open.unwrap;
// All images across all pages
let images = doc.extract_all_images.unwrap;
for in &images
// Images from a specific page
let page = doc.get_page.unwrap;
let page_images = doc.page_images;
Supported filters: FlateDecode, DCTDecode (JPEG), JPXDecode (JPEG2000), CCITTFaxDecode, LZWDecode, ASCIIHexDecode, ASCII85Decode, RunLengthDecode.
Metadata
Read document metadata from both the Info dictionary and XMP streams.
use Document;
let doc = open.unwrap;
let meta = doc.metadata;
if let Some = &meta.title
if let Some = &meta.author
if let Some = &meta.subject
if let Some = &meta.creator
if let Some = &meta.producer
if let Some = &meta.creation_date
XMP metadata is parsed with namespace-aware XML processing, supporting rdf:Alt/rdf:Seq containers and rdf:Description attribute forms.
Outlines (Bookmarks)
Read the document outline tree, including actions and styling.
use Document;
let doc = open.unwrap;
let outlines = doc.outlines;
for outline in &outlines
Annotations
Extract annotations from pages with full metadata.
use Document;
let doc = open.unwrap;
let page = doc.get_page.unwrap;
let annotations = doc.page_annotations;
for annot in &annotations
Supported annotation properties: subtype, rect, contents, flags (Hidden/Print/ReadOnly), color, author, modification date, URI (Link), QuadPoints (Highlight/Underline/StrikeOut).
Page Rendering
Render PDF pages to pixel images using the tiny-skia backend.
use ;
let doc = open.unwrap;
let renderer = new;
let pixmap = renderer.render_page.unwrap;
pixmap.save_png.unwrap;
// Or use the convenience method on Document
let pixmap = doc.render_page.unwrap;
The renderer supports the full ISO 32000-2 content stream operator set: paths, text (with embedded fonts), images, shading patterns, tiling patterns, transparency groups, soft masks, blend modes, clipping, and annotation overlays (Link and Highlight).
Creating PDFs
Build new PDF documents from scratch.
use Document;
let mut doc = new;
// Add pages with different sizes
doc.add_page.unwrap; // US Letter
doc.add_page.unwrap; // A4
// Save to disk
doc.save.unwrap;
// Or get bytes in memory
let bytes = doc.to_bytes.unwrap;
Page Manipulation
Merge, split, reorder, rotate, and remove pages.
use Document;
let mut doc = open.unwrap;
// Rotate a page
doc.rotate_page.unwrap;
// Remove a page
doc.remove_page.unwrap;
// Reorder pages
doc.reorder_pages.unwrap;
// Merge another PDF
let other = open.unwrap;
doc.merge.unwrap;
doc.save.unwrap;
Font Embedding
Embed TrueType and OpenType fonts with automatic subsetting.
use EmbeddedFont;
// Load and subset a TTF font
let font_data = read.unwrap;
let font = from_ttf.unwrap;
let subset = font.subset.unwrap;
// Measure text width
let width = font.measure_text.unwrap;
println!;
// For CJK text, use CidFont (supports >256 glyphs)
use CidFont;
let cjk_data = read.unwrap;
let cjk_font = from_ttf.unwrap;
let cjk_subset = cjk_font.subset.unwrap;
Form Fields
Read and fill AcroForm fields.
use Document;
let mut doc = open.unwrap;
// Read existing fields
for field in doc.form_fields
// Set a field value
doc.set_form_field.unwrap;
doc.save.unwrap;
Digital Signatures
Parse and verify digital signature integrity.
use Document;
let doc = open.unwrap;
for sig in doc.signatures
Standards Validation
Validate documents against PDF/A, PDF/X, and PDF/UA standards.
use ;
let doc = open.unwrap;
// PDF/A validation
let report = doc.validate_pdfa;
println!;
for check in &report.checks
// PDF/UA accessibility validation
let a11y = doc.accessibility_report;
println!;
// Structure tree inspection
if let Some = doc.structure_tree
Linearized PDF Writing
Write PDFs optimized for progressive web display (Fast Web View).
use Document;
let mut doc = new;
doc.add_page.unwrap;
// Standard output
let bytes = doc.to_bytes.unwrap;
// Linearized output (first page loads faster)
let linearized = doc.to_linearized_bytes.unwrap;
write.unwrap;
Incremental Updates
Append changes without rewriting the original file, preserving digital signatures.
use Document;
let original = read.unwrap;
let mut doc = from_bytes.unwrap;
// Make changes
doc.set_form_field.unwrap;
// Write as incremental update (original bytes preserved)
let updated = doc.to_incremental_update.unwrap;
write.unwrap;
Architecture
pdfpurr/
src/
core/ PDF object model (Object, Dictionary, PdfStream, filters)
parser/ Lexer, object parser, file structure, xref rebuild
content/ Content stream tokenizer and builder
fonts/ Font parsing, encoding, embedding, subsetting, Standard 14
images/ Image extraction and embedding
rendering/ Page-to-pixel engine (tiny-skia backend)
encryption/ Standard security handler (R2-R6, RC4, AES-128/256)
forms/ AcroForms (read/write)
signatures/ Digital signature parsing and verification
standards/ PDF/A, PDF/X validation
accessibility/ PDF/UA, tagged PDF, structure tree
structure/ Outlines, annotations, metadata
document.rs High-level Document API
page_builder.rs Page creation API
error.rs Error types
tests/
corpus_tests.rs Real-world PDF corpus (26 files)
integration.rs Write-path roundtrip tests
proptest_fuzz.rs Property-based fuzzing (14 targets)
fuzz/ cargo-fuzz targets (document, content stream, object)
benches/ Criterion benchmarks (parser, tokenizer, renderer)
Testing
CI runs automatically on every push: Ubuntu/Windows/macOS (stable), Ubuntu nightly, Criterion benchmarks, and 120 seconds of fuzz testing across 3 targets.
Performance
Measured on a standard developer machine:
| Operation | Time |
|---|---|
| Parse 14-page PDF (1 MB) | 26 ms |
| Text extraction per page | 4.7 ms |
| Render page at 150 DPI | 210 ms |
| Create new Document | < 1 us |
License
Dual-licensed under MIT or Apache-2.0 at your option.