pdf_core 0.1.1-preview.8

Low-level PDF parsing primitives for semantic PDF diffing.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# pdf_core

Low-level PDF parsing, object graph, stream handling, diagnostics, and resource-limit
enforcement.

Current stream decoding supports no-filter streams, `FlateDecode`,
`ASCIIHexDecode`, and `RunLengthDecode`. Unsupported filters and failed decodes
produce stable diagnostics while preserving raw bytes when possible.

Current page content resolution supports a single `/Contents` stream reference
or an ordered `/Contents [...]` array of stream references for controlled
vertical-slice fixtures, and exposes ordered content streams across all parsed
pages for CLI extraction.

Current tagged-PDF support parses simple `/StructTreeRoot` trees into
deterministic structure elements with structure type names, MCID references, and
controlled `/ParentTree` number-tree entries. Full parent-tree use in semantic
node construction remains later work.