pdfer_forms — Fast Pure-Rust PDF Forms & Document Operations
pdfer_forms is a fast, pure-Rust library for filling PDF forms, inspecting AcroForm fields, flattening fillable PDFs, and document operations (merge, split, rotate, encrypt/decrypt) — with an API modeled on Python's pypdf and PyPDF2.
Built and maintained by Clark Labs Inc.
- Pure Rust, no Python, no C dependencies (built on
lopdf) - Fast — up to 23× faster than pypdf on real-world government forms (see benchmarks below)
- PyPDF / PyPDF2 compatibility layer — familiar
get_fields,update_page_form_field_values,updatePageFormFieldValues, etc. - AcroForm inspection, form filling, and flattening in one crate
- Document operations — merge, split, rotate, encrypt, decrypt (replaces
qpdfCLI) #![forbid(unsafe_code)]
Why pdfer_forms?
If you are porting a Python PDF workflow to Rust, or building a Rust service that needs to fill fillable PDFs (government forms, tax forms, application forms, contracts), the pure-Rust PDF ecosystem has historically been thin on AcroForm support. pdfer_forms closes that gap:
- Inspect every AcroForm field, including qualified and partial names
- Fill text, checkbox, radio, and choice (listbox / combo) fields
- Regenerate widget appearance streams so filled values render in every viewer
- Flatten forms (draw appearances into page content) for archival or print
- Strip widget annotations for final delivery
- Reattach orphan widgets to
/AcroForm /Fields - Toggle
/NeedAppearances, group fields under a top-level name, and more
Install
[]
= "0.2"
Requires Rust 1.85+ (set by the pinned lopdf = 0.40 dependency).
Quick start — fill a PDF form in Rust
use ;
use BTreeMap;
Document operations — merge, split, rotate, encrypt
New in 0.2.0: the ops module provides pure-Rust replacements for qpdf / pdftk CLI operations.
use ops;
Available functions
| Function | Description |
|---|---|
ops::merge_documents(docs) |
Merge multiple lopdf::Documents into one |
ops::merge_files(paths) |
Load and merge PDFs from file paths |
ops::split_pages(doc, pages) |
Extract specific pages (1-based) into a new document |
ops::split_each_page(doc) |
Split into one document per page |
ops::rotate_pages(doc, pages, degrees) |
Rotate pages by 0/90/180/270 degrees |
ops::encrypt_document(doc, user_pw, owner_pw) |
Encrypt with AES-128 |
ops::decrypt_document(doc, password) |
Decrypt with password |
Features
- AcroForm tree inspection
- Qualified and partial field names
- Text field value extraction
- Page lookup for repeated widgets
- Top-level form grouping / renaming
/NeedAppearancescontrol- Page-scoped field filling
- Text and choice appearance regeneration
- Button state updates for checkboxes and radio groups
- Orphan widget reattachment to
/AcroForm /Fields - Annotation removal for post-flatten cleanup
- Optional flatten step that draws widget appearance streams into page content
FieldInput::KeepCurrentfor flattening without changing the stored value
Known caveats
- Generated text appearances use built-in Type1 fonts and a simple WinAnsi text stream. The stored field value uses UTF-16BE, but generated appearance content itself is safest for ASCII / WinAnsi text.
- Signature-field appearance generation is not implemented.
- The API is intentionally close to pypdf / PyPDF2, but remains idiomatic Rust rather than mimicking Python objects exactly.
PyPDF / PyPDF2 API compatibility
pdfer_forms mirrors the form-manipulation surface of pypdf and PyPDF2, including camelCase aliases:
| pypdf / PyPDF2 | pdfer_forms |
|---|---|
PdfReader.get_fields() |
PdfReaderCompat::get_fields |
PdfReader.get_form_text_fields() |
PdfReaderCompat::get_form_text_fields |
PdfReader.get_pages_showing_field() |
PdfReaderCompat::get_pages_showing_field |
PdfWriter.add_form_topname() |
PdfReaderCompat::add_form_topname |
PdfWriter.rename_form_topname() |
PdfReaderCompat::rename_form_topname |
PdfWriter.set_need_appearances_writer() |
PdfWriterCompat::set_need_appearances_writer |
PdfWriter.update_page_form_field_values() |
PdfWriterCompat::update_page_form_field_values |
PdfWriter.reattach_fields() |
PdfWriterCompat::reattach_fields |
PdfWriter.remove_annotations() |
PdfWriterCompat::remove_annotations |
updatePageFormFieldValues (PyPDF2) |
updatePageFormFieldValues |
setNeedAppearancesWriter (PyPDF2) |
setNeedAppearancesWriter |
Reader-like surface
use PdfReaderCompat;
let mut reader = load?;
let all_fields = reader.get_fields?;
let text_fields = reader.get_form_text_fields?;
let pages = reader.get_pages_showing_field?;
reader.add_form_topname?;
reader.rename_form_topname?;
Writer-like surface
use ;
use BTreeMap;
let mut writer = load?;
writer.set_need_appearances_writer?;
let mut fields = new;
fields.insert;
fields.insert;
fields.insert;
writer.update_page_form_field_values?;
writer.remove_annotations?;
writer.save?;
PyPDF2 camelCase shims
use ;
use BTreeMap;
let mut writer = load?;
writer.setNeedAppearancesWriter?;
let mut fields = new;
fields.insert;
writer.updatePageFormFieldValues?;
Main types
PdfReaderCompat— pypdf-style reader wrapperPdfWriterCompat— pypdf-style writer wrapperFormField— an AcroForm field with value, type, and widgetsFieldValue— decoded field value (text, button state, choice list)FieldInput— input variant for field updates (text, button, choice,KeepCurrent)PageSelection—AllorIndex(n)scope for updatesPageHandle— page identity helperFieldSpecifier— qualified / partial field name resolver
Benchmarks — pdfer_forms vs pypdf / PyPDF2
Benchmarked against pypdf 6.9.2 and PyPDF2 3.0.1 on 9 real-world government PDF forms (IRS, USCIS, GSA, Hong Kong IRD, Guatemala SAT) in English, Spanish, and Chinese.
Accuracy
| Metric | Result |
|---|---|
| Field name match rate | 1004/1011 (99.3%) |
| Field type match rate | 1004/1004 (100.0%) |
| Field value match rate | 1004/1004 (100.0%) |
The 7 name mismatches are encoding differences on a single Spanish-language PDF where pypdf decodes non-ASCII field names (e.g. DÍA) while pdfer_forms currently returns the raw bytes.
Performance (average across 9 PDFs)
| Operation | pypdf | pdfer_forms | Speedup |
|---|---|---|---|
get_fields |
12.1 ms | 0.51 ms | 23.6× faster |
get_pages_showing_field |
2.2 ms | 0.47 ms | 4.8× faster |
fill_form |
40.3 ms | 11.0 ms | 3.7× faster |
remove_annotations |
26.4 ms | 6.8 ms | 3.9× faster |
get_form_text_fields |
1.2 ms | 0.49 ms | 2.4× faster |
load |
1.9 ms | 5.2 ms | 2.7× slower* |
*Load is slower because lopdf eagerly parses the full cross-reference table; pypdf uses lazy loading. For most workflows the total round-trip is still faster.
API parity
All 6 core pypdf form APIs pass on every test PDF. PyPDF2-style camelCase aliases (getFields, updatePageFormFieldValues, etc.) are included.
Related crates
lopdf— the pure-Rust PDF library this crate is built onprintpdf— for generating PDFs from scratchpdf— another pure-Rust PDF reader
Contributing
Issues and pull requests are welcome at https://github.com/clark-labs-inc/pdfer-forms-rs.
License
Licensed under either of
- Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (https://opensource.org/licenses/MIT)
at your option.
© Clark Labs Inc. pdfer_forms is not affiliated with the authors of pypdf or PyPDF2. pypdf and PyPDF2 are trademarks of their respective owners; compatibility is provided for porting convenience.