lopdf
A Rust library for PDF document manipulation.
A useful reference for understanding the PDF file format and the eventual usage of this library is the PDF 1.7 Reference Document. The PDF 2.0 specification is available here.
Requirements
- Rust 1.85 or later - Required for Rust 2024 edition features and object streams support
- To check your Rust version:
rustc --version
- To update Rust:
rustup update
Example Code
- Create PDF document
use dictionary;
use ;
use ;
// `with_version` specifes the PDF version this document complies with.
let mut doc = with_version;
// Object IDs are used for cross referencing in PDF documents.
// `lopdf` helps keep track of them for us. They are simple integers.
// Calls to `doc.new_object_id` and `doc.add_object` return an object ID.
// "Pages" is the root node of the page tree.
let pages_id = doc.new_object_id;
// Fonts are dictionaries. The "Type", "Subtype" and "BaseFont" tags
// are straight out of the PDF spec.
//
// The dictionary macro is a helper that allows complex
// key-value relationships to be represented in a simpler
// visual manner, similar to a match statement.
// A dictionary is implemented as an IndexMap of Vec<u8>, and Object
let font_id = doc.add_object;
// Font dictionaries need to be added into resource
// dictionaries in order to be used.
// Resource dictionaries can contain more than just fonts,
// but normally just contains fonts.
// Only one resource dictionary is allowed per page tree root.
let resources_id = doc.add_object;
// `Content` is a wrapper struct around an operations struct that contains
// a vector of operations. The operations struct contains a vector of
// that match up with a particular PDF operator and operands.
// Refer to the PDF spec for more details on the operators and operands
// Note, the operators and operands are specified in a reverse order
// from how they actually appear in the PDF file itself.
let content = Content ;
// Streams are a dictionary followed by a (possibly encoded) sequence of bytes.
// What that sequence of bytes represents, depends on the context.
// The stream dictionary is set internally by lopdf and normally doesn't
// need to be manually manipulated. It contains keys such as
// Length, Filter, DecodeParams, etc.
let content_id = doc.add_object;
// Page is a dictionary that represents one page of a PDF file.
// Its required fields are "Type", "Parent" and "Contents".
let page_id = doc.add_object;
// Again, "Pages" is the root of the page tree. The ID was already created
// at the top of the page, since we needed it to assign to the parent element
// of the page dictionary.
//
// These are just the basic requirements for a page tree root object.
// There are also many additional entries that can be added to the dictionary,
// if needed. Some of these can also be defined on the page dictionary itself,
// and not inherited from the page tree root.
let pages = dictionary! ;
// Using `insert()` here, instead of `add_object()` since the ID is already known.
doc.objects.insert;
// Creating document catalog.
// There are many more entries allowed in the catalog dictionary.
let catalog_id = doc.add_object;
// The "Root" key in trailer is set to the ID of the document catalog,
// the remainder of the trailer is set during `doc.save()`.
doc.trailer.set;
doc.compress;
// Store file in current working directory.
// Note: Line is excluded when running tests
if false
- Merge PDF documents
use dictionary;
use BTreeMap;
use ;
use ;
- Decrypt PDF documents
use Document;
// Load and decrypt PDF documents with empty password
- Modify PDF document
use Document;
// For this example to work a parser feature needs to be enabled
// For this example to work a parser feature needs to be enabled
- Save PDF with Object Streams (Modern Format)
Object streams allow multiple non-stream objects to be compressed together, significantly reducing file size.
use ;
Complete Example: Creating and Saving with Object Streams
use ;
use File;
For more examples, see:
examples/object_streams.rs
- Creating PDFs with object streamsexamples/compress_existing_pdf.rs
- Compress existing PDFsexamples/analyze_object_streams.rs
- Analyze object stream usage
Object Streams Support
lopdf now includes full support for creating and reading PDF object streams (PDF 1.5+ feature). Object streams provide significant file size reduction by compressing multiple non-stream objects together.
Key Benefits
- File size reduction: 11-61% smaller PDFs depending on content
- Modern PDF compliance: Full PDF 1.5+ specification support
- Backward compatibility: All existing APIs remain unchanged
- Performance: <2ms to check 1000 objects for compression eligibility
Creating Object Streams Directly
use ;
#
Object Eligibility
Not all objects can be compressed into object streams. The following objects are excluded:
- Stream objects (content streams, image streams, etc.)
- Cross-reference streams (Type = XRef)
- Object streams themselves (Type = ObjStm)
- Encryption dictionary (when referenced by trailer's Encrypt entry)
- Objects with generation number > 0
- Document catalog in linearized PDFs only
All other objects, including structural objects (Catalog, Pages, Page) and trailer-referenced objects (except encryption), can be compressed.
Cross-reference Streams
When using save_modern()
or enabling use_xref_streams(true)
, lopdf creates binary cross-reference streams instead of traditional ASCII cross-reference tables. This provides additional space savings and is part of the PDF 1.5+ specification.
SaveOptions Reference
The SaveOptions
builder provides fine-grained control over PDF compression:
use SaveOptions;
let options = builder
.use_object_streams // Enable object streams (default: false)
.use_xref_streams // Enable xref streams (default: false)
.max_objects_per_stream // Max objects per stream (default: 100)
.compression_level // zlib level 0-9 (default: 6)
.build;
PDF Decryption Support
lopdf now includes enhanced support for reading encrypted PDF documents. The library can automatically decrypt PDFs that use empty passwords, which is common for many protected documents.
Key Features
- Automatic decryption: PDFs encrypted with empty passwords are automatically decrypted on load
- Object stream support: Handles encrypted PDFs containing compressed object streams
- Transparent access: Once decrypted, all document methods work normally
- Preservation of structure: Document structure and content remain intact after decryption
How It Works
When loading an encrypted PDF, lopdf:
- Detects encryption via the
Encrypt
entry in the trailer - Extracts raw object bytes before parsing
- Attempts authentication with an empty password
- Decrypts all objects if authentication succeeds
- Processes compressed objects from object streams
Example: Working with Encrypted PDFs
use Document;
async
Limitations
- Currently only supports PDFs encrypted with empty passwords
- Password-protected PDFs require manual authentication (use
authenticate_password
method) - Some encryption algorithms may not be fully supported
For more examples, see:
examples/test_decryption.rs
- Testing decryption functionalityexamples/verify_decryption.rs
- Comprehensive decryption verificationtests/decryption.rs
- Decryption test suite
FAQ
-
Why does the library keep everything in memory as high-level objects until finally serializing the entire document?
Normally, a PDF document won't be very large, ranging from tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keeping the whole document in memory, the stream length can be pre-calculated, no need to use a reference object for the Length entry. The resulting PDF file is smaller for distribution and faster for PDF consumers to process.
Producing is a one-time effort, while consuming is many more.
-
How do object streams affect memory usage?
Object streams actually help reduce memory usage during document creation. When enabled, multiple small objects are grouped and compressed together, reducing the overall memory footprint. The compression happens during the save operation, so the in-memory representation remains the same until
save_with_options()
orsave_modern()
is called. -
What PDF versions support object streams?
Object streams were introduced in PDF 1.5. When using
save_modern()
or object streams, lopdf automatically ensures the document version is at least 1.5. For maximum compatibility with older PDF readers, you can use the traditionalsave()
method. -
Can I analyze existing PDFs to see if they use object streams?
Yes! lopdf can read and parse object streams from existing PDFs. Use the
Document::load()
method to open any PDF, and lopdf will automatically handle object streams if present. See the examples directory for analysis tools.
License
lopdf is available under the MIT license, with the exception of the Montserrat font.