pdfer_forms
Pure Rust AcroForm inspection and filling with a compatibility surface modeled after the PDF form-manipulation pieces of pypdf and PyPDF2.
This crate is deliberately focused on the form APIs people usually reach for when they are porting Python workflows:
get_fieldsget_form_text_fieldsget_pages_showing_fieldadd_form_topnamerename_form_topnameset_need_appearances_writerupdate_page_form_field_valuesreattach_fieldsremove_annotations- PyPDF2-style camelCase aliases such as
updatePageFormFieldValues
Under the hood it uses lopdf, which is itself a pure-Rust PDF library.
Status
The crate is intended as a native Rust port of the form-manipulation surface, not as a line-for-line port of the full Python libraries.
Implemented here:
- AcroForm tree inspection
- qualified and partial field names
- text field value extraction
- page lookup for repeated widgets
- top-level form grouping / renaming
/NeedAppearancescontrol- page-scoped field filling
- text/choice appearance regeneration
- button state updates for checkboxes / radios
- orphan widget reattachment to
/AcroForm /Fields - annotation removal for post-flatten cleanup
- optional flatten step that draws widget appearance streams into page content
FieldInput::KeepCurrentfor flattening without changing the stored value
Known caveats:
- Generated text appearances use built-in Type1 fonts and a simple WinAnsi text stream. The stored field value uses UTF-16BE, but generated appearance content itself is safest for ASCII / WinAnsi text.
- Signature-field appearance generation is not implemented.
- The API shape is intentionally close to pypdf / PyPDF2, but it remains idiomatic Rust rather than trying to mimic Python objects exactly.
Rust version
This crate is configured for lopdf = 0.40, which currently expects Rust 1.85+.
Quick start
use ;
use BTreeMap;
API notes
Reader-like surface
use PdfReaderCompat;
let mut reader = load?;
let all_fields = reader.get_fields?;
let text_fields = reader.get_form_text_fields?;
let pages = reader.get_pages_showing_field?;
reader.add_form_topname?;
reader.rename_form_topname?;
Writer-like surface
use ;
use BTreeMap;
let mut writer = load?;
writer.set_need_appearances_writer?;
let mut fields = new;
fields.insert;
fields.insert;
fields.insert;
writer.update_page_form_field_values?;
writer.remove_annotations?;
writer.save?;
PyPDF2 compatibility shims
use ;
use BTreeMap;
let mut writer = load?;
writer.setNeedAppearancesWriter?;
let mut fields = new;
fields.insert;
writer.updatePageFormFieldValues?;
Main types
PdfReaderCompatPdfWriterCompatFormFieldFieldValueFieldInputPageSelectionPageHandleFieldSpecifier
Benchmarks — pdfer_forms vs pypdf / PyPDF2
Benchmarked against pypdf 6.9.2 and PyPDF2 3.0.1 on 9 real-world government PDF forms (IRS, USCIS, GSA, Hong Kong IRD, Guatemala SAT) in English, Spanish, and Chinese.
Accuracy
| Metric | Result |
|---|---|
| Field name match rate | 1004/1011 (99.3%) |
| Field type match rate | 1004/1004 (100.0%) |
| Field value match rate | 1004/1004 (100.0%) |
The 7 name mismatches are encoding differences on a single Spanish-language PDF
where pypdf decodes non-ASCII field names (e.g. DÍA) while pdfer_forms
currently returns the raw bytes.
Performance (average across 9 PDFs)
| Operation | pypdf | pdfer_forms | Speedup |
|---|---|---|---|
get_fields |
12.1 ms | 0.51 ms | 23.6x faster |
get_pages_showing_field |
2.2 ms | 0.47 ms | 4.8x faster |
fill_form |
40.3 ms | 11.0 ms | 3.7x faster |
remove_annotations |
26.4 ms | 6.8 ms | 3.9x faster |
get_form_text_fields |
1.2 ms | 0.49 ms | 2.4x faster |
load |
1.9 ms | 5.2 ms | 2.7x slower* |
*Load is slower because lopdf eagerly parses the full cross-reference table;
pypdf uses lazy loading. For most workflows the total round-trip is still faster.
API Parity
All 6 core pypdf form APIs pass on every test PDF. PyPDF2-style camelCase
aliases (getFields, updatePageFormFieldValues, etc.) are included.
License
MIT OR Apache-2.0