Expand description
PDF file structure parser for rpdfium — a faithful Rust port of PDFium.
This crate implements the PDF file format parser, including:
- Object model: PDF objects (null, boolean, integer, real, string, name, array, dictionary, stream, reference) with lazy resolution.
- Tokenizer: Low-level byte-to-token conversion.
- Header parsing:
%PDF-X.Yversion detection. - Cross-reference tables: Traditional
xrefand PDF 1.5+ xref streams. - Trailer parsing:
startxreflocation,/Prevchain following. - Object streams: ObjStm (PDF 1.5+) decompression and extraction.
- ObjectStore: Central thread-safe lazy-parsing object repository.
- Linearization detection: Checks for linearized PDF markers.
- Content stream tokenization: Parses PostScript-like operator sequences.
§Design Principles
#![forbid(unsafe_code)]- All deep operations are iterative (explicit
Vecstacks), never recursive. OnceLock-based lazy parsing: each object is parsed at most once.- Stream
/Lengthuses direct-object-only + endstream scan fallback. - Security limits enforced:
MAX_OBJECT_NUMBER,MAX_RECURSION_DEPTH, etc.
Re-exports§
pub use content_stream::Operand;pub use content_stream::Operator;pub use content_stream::TextArrayElement;pub use content_stream::tokenize_content_stream;pub use crypto::CryptoError;pub use filter::resolve_filter_chain;pub use header::PdfVersion;pub use hint_tables::HintTables;pub use hint_tables::PageOffsetHintTable;pub use linearized_header::LinearizedInfo;pub use linearized_header::detect_linearized;pub use object::Object;pub use object::StreamData;pub use object_walker::ObjectStats;pub use object_walker::ObjectVisitor;pub use object_walker::ObjectWalker;pub use security::Permissions;pub use security::SecurityError;pub use security::SecurityHandler;pub use store::ObjectStore;pub use trailer::TrailerInfo;pub use xref::XrefEntry;pub use xref::XrefEntryType;pub use xref::XrefSection;pub use xref::XrefTable;
Modules§
- content_
stream - Content stream operator tokenization (Stage 1).
- crypto
- Low-level cryptographic primitives for PDF encryption.
- filter
- Filter chain resolution — maps stream dictionary
/Filterand/DecodeParmsentries to codec types. - header
- PDF header parsing — detects
%PDF-X.Yand returns the version. - hint_
tables - Linearization hint tables – page offset and shared object hint table parsing.
- linearized_
header - Linearization detection.
- object
- PDF object model — ObjectId, StreamData, and the Object enum.
- object_
parser - PDF object parsing — builds
Objectvalues from token streams. - object_
stream - Object stream (ObjStm) parsing.
- object_
walker - Object graph walker – iterative BFS traversal with cycle detection.
- security
- PDF Standard Security Handler (R2–R6).
- store
- ObjectStore — the central data structure for PDF object access.
- tokenizer
- Low-level PDF tokenizer.
- trailer
- Trailer parsing — locates
startxref, parses trailer dictionary, and follows the/Prevchain to build the full cross-reference table. - xref
- Cross-reference table parsing (traditional
xrefformat). - xref_
stream - Cross-reference stream parsing (PDF 1.5+).
Structs§
- Object
Id - Unique identifier for a PDF indirect object.