lopdf
A Rust library for PDF document manipulation.
A useful reference for understanding the PDF file format and the eventual usage of this libary is the PDF 1.7 Reference Document. The PDF 2.0 specification is availabe here.
Example Code
- Create PDF document
use dictionary;
use ;
use ;
// with_version specifes the PDF version this document complies with.
let mut doc = with_version;
// Object IDs are used for cross referencing in PDF documents. `lopdf` helps keep track of them
// for us. They are simple integers.
// Calls to `doc.new_object_id` and `doc.add_object` return an object id
// pages is the root node of the page tree
let pages_id = doc.new_object_id;
// fonts are dictionaries. The type, subtype and basefont tags
// are straight out of the PDF reference manual
//
// The dictionary macro is a helper that allows complex
// key, value relationships to be represented in a simpler
// visual manner, similar to a match statement.
// Dictionary is linkedHashMap of byte vector, and object
let font_id = doc.add_object;
// font dictionaries need to be added into resource dictionaries
// in order to be used.
// Resource dictionaries can contain more than just fonts,
// but normally just contains fonts
// Only one resource dictionary is allowed per page tree root
let resources_id = doc.add_object;
// Content is a wrapper struct around an operations struct that contains a vector of operations
// The operations struct contains a vector of operations that match up with a particular PDF
// operator and operands.
// Reference the PDF reference for more details on these operators and operands.
// Note, the operators and operands are specified in a reverse order than they
// actually appear in the PDF file itself.
let content = Content ;
// Streams are a dictionary followed by a sequence of bytes. What that sequence of bytes
// represents depends on context
// The stream dictionary is set internally to lopdf and normally doesn't
// need to be manually manipulated. It contains keys such as
// Length, Filter, DecodeParams, etc
//
// content is a stream of encoded content data.
let content_id = doc.add_object;
// Page is a dictionary that represents one page of a PDF file.
// It has a type, parent and contents
let page_id = doc.add_object;
// Again, pages is the root of the page tree. The ID was already created
// at the top of the page, since we needed it to assign to the parent element of the page
// dictionary
//
// This is just the basic requirements for a page tree root object. There are also many
// additional entries that can be added to the dictionary if needed. Some of these can also be
// defined on the page dictionary itself, and not inherited from the page tree root.
let pages = dictionary! ;
// using insert() here, instead of add_object() since the id is already known.
doc.objects.insert;
// Creating document catalog.
// There are many more entries allowed in the catalog dictionary.
let catalog_id = doc.add_object;
// Root key in trailer is set here to ID of document catalog,
// remainder of trailer is set during doc.save().
doc.trailer.set;
doc.compress;
// Store file in current working directory.
// Note: Line is excluded when running tests
if false
- Merge PDF documents
use dictionary;
use BTreeMap;
use ;
use ;
- Modify PDF document
use Document;
// For this example to work a parser feature needs to be enabled
FAQ
-
Why does the library keep everything in memory as high-level objects until finally serializing the entire document?
Normally a PDF document won't be very large, ranging from tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keeping the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.
Producing is a one-time effort, while consuming is many more.