Crate rpdfium_parser

Expand description

PDF file structure parser for rpdfium — a faithful Rust port of PDFium.

This crate implements the PDF file format parser, including:

Object model: PDF objects (null, boolean, integer, real, string, name, array, dictionary, stream, reference) with lazy resolution.
Tokenizer: Low-level byte-to-token conversion.
Header parsing: %PDF-X.Y version detection.
Cross-reference tables: Traditional xref and PDF 1.5+ xref streams.
Trailer parsing: startxref location, /Prev chain following.
Object streams: ObjStm (PDF 1.5+) decompression and extraction.
ObjectStore: Central thread-safe lazy-parsing object repository.
Linearization detection: Checks for linearized PDF markers.
Content stream tokenization: Parses PostScript-like operator sequences.

§Design Principles

content_stream: Content stream operator tokenization (Stage 1).
crypto: Low-level cryptographic primitives for PDF encryption.
filter: Filter chain resolution — maps stream dictionary /Filter and /DecodeParms entries to codec types.
header: PDF header parsing — detects %PDF-X.Y and returns the version.
hint_tables: Linearization hint tables – page offset and shared object hint table parsing.
linearized_header: Linearization detection.
object: PDF object model — ObjectId, StreamData, and the Object enum.
object_parser: PDF object parsing — builds Object values from token streams.
object_stream: Object stream (ObjStm) parsing.
object_walker: Object graph walker – iterative BFS traversal with cycle detection.
security: PDF Standard Security Handler (R2–R6).
store: ObjectStore — the central data structure for PDF object access.
tokenizer: Low-level PDF tokenizer.
trailer: Trailer parsing — locates startxref, parses trailer dictionary, and follows the /Prev chain to build the full cross-reference table.
xref: Cross-reference table parsing (traditional xref format).
xref_stream: Cross-reference stream parsing (PDF 1.5+).