pdf_core 0.1.0

Low-level PDF parsing primitives for semantic PDF diffing.
Documentation

pdf_core

Low-level PDF parsing, object graph, stream handling, diagnostics, and resource-limit enforcement.

Current stream decoding supports no-filter streams, FlateDecode, ASCIIHexDecode, and RunLengthDecode. Unsupported filters and failed decodes produce stable diagnostics while preserving raw bytes when possible.

Current page content resolution supports a single /Contents stream reference or an ordered /Contents [...] array of stream references for controlled vertical-slice fixtures, and exposes ordered content streams across all parsed pages for CLI extraction.

Current tagged-PDF support parses simple /StructTreeRoot trees into deterministic structure elements with structure type names, MCID references, and controlled /ParentTree number-tree entries. Full parent-tree use in semantic node construction remains later work.